Videos representing smoke, flames, flowing water, moving grass, and so on, show a periodic repetition of some
basic pattern. The periodicity is not perfect, since each video frame is actually different from the others, but it is enough to
perceive a certain regularity.
The aim of this research is to find a model for such type of videos. This means that we want to find a formal (mathematical) description
of the video that can explain the process of generation of each frame. Being the real physical process that drives the generation
of fire or water a very complex one, we aim at finding a simple model based on the video considered as a collection of images and not
on the the physics of the real process.
Once this description is at disposal, we can create longer video sequences, by just producing new video frames using the model.
This process is call synthesis and the images created are called synthetic.
Currently, a simple model that is capable of describing dynamic texture in an efficient way exists.
It is the linear model of Doretto et al. [1], where the dynamic texture is considered as a dynamical system and its properties are
modelled using methods borrowed from the system identification community. The analysis consists in representing each image as a point
in a given subspace and in identifying its trajectory. The synthesis is obtained by driving the system with white noise.
In this research we used this model to analyze different videos and proposed some improvements that render the model more compact and thus
suitable for architectures where computational and memory constraints are significant. First, we used the YCbCr instead of RGB color encoding
for each color image frame. This permitted to half the memory needed to store the model coefficients, because the input signal in YCbCr color
encoding can be compressed before the analysis without much loss of quality.
Second, we considered a different signal decomposition in the analysis part, which could take into account the spatial
relationship between image pixels. Mainly, we proposed to decompose the multidimensional (tensor)
signal that represents a color video sequence in a direct way, using a tensor decomposition technique.
We showed that decomposition techniques originally applied to study psychometric or chemometric data can be
used for this purpose. Since spatial, time, and color information are analyzed at the same time, such techniques
permit to obtain very compact models.
Example of dynamic texture synthesis; right: real texture frame; left: synthesized texture frame
[1] G. Doretto A. Chiuso, S. Soatto, Y.N. Wu, "Dynamic Textures", International Journal of Computer Vision, 51(2): 91-109, 2003
Luciano Sbaiz
Sabine Süsstrunk
This project is supported by the Swiss National Science Foundation (SNF) under grant number 21-067012.01.