Pursuing atomic video words by information projection

Authors:
Youdong Zhao;Haifeng Gong;Yunde Jia
Affiliations:
School of Computer Science, Beijing Institute of Technology, Beijing, China;GRASP Lab., University of Pennsylvania, Philadelphia, PA;School of Computer Science, Beijing Institute of Technology, Beijing, China
Venue:
ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part II
Year:
2010

Citing 11
Cited 0

Filters, Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling

International Journal of Computer Vision
What Is the Set of Images of an Object Under All Possible Illumination Conditions?

International Journal of Computer Vision
Resolving Motion Correspondence for Densely Moving Points

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling Textured Motion: Particle, Wave and Sketch

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Tracking Articulated Body by Dynamic Markov Network

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
What are Textons?

International Journal of Computer Vision - Special Issue on Texture Analysis and Synthesis
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Space-Time Behavior-Based Correlation—OR—How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them?

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Layered Dynamic Textures

IEEE Transactions on Pattern Analysis and Machine Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study mathematical models of atomic visual patterns from natural videos and establish a generative visual vocabulary for video representation. Empirically, we employ small video patches (e.g., 15×15×5, called video "bricks") in natural videos as basic analysis unit. There are a variety of brick subspaces (or atomic video words) of varying dimensions in the high dimensional brick space. The structures of the words are characterized by both appearance and motion dynamics. Here, we categorize the words into two pure types: structural video words (SVWs) and textural video words (TVWs). A common generative model is introduced to model these two type video words in a unified form. The representation power of a word is measured by its information gain, based on which words are pursued one by one via a novel pursuit algorithm, and finally a holistic video vocabulary is built up. Experimental results show the potential power of our framework for video representation.