Layered Dynamic Textures

Authors:
Antoni B. Chan;Nuno Vasconcelos
Affiliations:
University of California, San Diego, La Jolla;University of California, San Diego, La Jolla
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 9

Modeling music as a dynamic texture

IEEE Transactions on Audio, Speech, and Language Processing
Mixed-state causal modeling for statistical KL-based motion texture tracking

Pattern Recognition Letters
Fast dynamic texture detection

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Gait recognition based on improved dynamic Bayesian networks

Pattern Recognition
Learning spatio-temporal dependency of local patches for complex motion segmentation

Computer Vision and Image Understanding
A unified approach to segmentation and categorization of dynamic textures

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part I
Pursuing atomic video words by information projection

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Dynamic texture analysis and segmentation using deterministic partially self-avoiding walks

Expert Systems with Applications: An International Journal
Dynamic texture segmentation based on deterministic partially self-avoiding walks

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.15

Visualization

Abstract

A novel video representation, the layered dynamic texture (LDT), is proposed. The LDT is a generative model, which represents a video as a collection of stochastic layers of different appearance and dynamics. Each layer is modeled as a temporal texture sampled from a different linear dynamical system. The LDT model includes these systems, a collection of hidden layer assignment variables (which control the assignment of pixels to layers), and a Markov random field prior on these variables (which encourages smooth segmentations). An EM algorithm is derived for maximum-likelihood estimation of the model parameters from a training video. It is shown that exact inference is intractable, a problem which is addressed by the introduction of two approximate inference procedures: a Gibbs sampler and a computationally efficient variational approximation. The trade-off between the quality of the two approximations and their complexity is studied experimentally. The ability of the LDT to segment videos into layers of coherent appearance and dynamics is also evaluated, on both synthetic and natural videos. These experiments show that the model possesses an ability to group regions of globally homogeneous, but locally heterogeneous, stochastic dynamics currently unparalleled in the literature.