Dynamic Texture Synthesis

Author

Roberto Costantini

Abstract

Videos representing smoke, flames, flowing water, moving grass, and so on, show a periodic repetition of some basic pattern. The periodicity is not perfect, since each video frame is actually different from the others, but it is enough to perceive a certain regularity.
The aim of this research is to find a model for such type of videos. This means that we want to find a formal (mathematical) description of the video that can explain the process of generation of each frame. Being the real physical process that drives the generation of fire or water a very complex one, we aim at finding a simple model based on the video considered as a collection of images and not on the the physics of the real process.
Once this description is at disposal, we can create longer video sequences, by just producing new video frames using the model. This process is call synthesis and the images created are called synthetic.
Currently, a simple model that is capable of describing dynamic texture in an efficient way exists. It is the linear model of Doretto et al. [1], where the dynamic texture is considered as a dynamical system and its properties are modelled using methods borrowed from the system identification community. The analysis consists in representing each image as a point in a given subspace and in identifying its trajectory. The synthesis is obtained by driving the system with white noise.
In this research we used this model to analyze different videos and proposed some improvements that render the model more compact and thus suitable for architectures where computational and memory constraints are significant. First, we used the YCbCr instead of RGB color encoding for each color image frame. This permitted to half the memory needed to store the model coefficients, because the input signal in YCbCr color encoding can be compressed before the analysis without much loss of quality.
Second, we considered a different signal decomposition in the analysis part, which could take into account the spatial relationship between image pixels. Mainly, we proposed to decompose the multidimensional (tensor) signal that represents a color video sequence in a direct way, using a tensor decomposition technique. We showed that decomposition techniques originally applied to study psychometric or chemometric data can be used for this purpose. Since spatial, time, and color information are analyzed at the same time, such techniques permit to obtain very compact models.

describe your image
Example of dynamic texture synthesis; right: real texture frame; left: synthesized texture frame

 

[1] G. Doretto A. Chiuso, S. Soatto, Y.N. Wu, "Dynamic Textures", International Journal of Computer Vision, 51(2): 91-109, 2003

 

Collaborations

Luciano Sbaiz
Sabine Süsstrunk

Publications

Funding

This project is supported by the Swiss National Science Foundation (SNF) under grant number 21-067012.01.