Online Bayesian Video Summarization and Linking

Authors:
Xavier Orriols;Xavier Binefa
Affiliations:
-;-
Venue:
CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Year:
2002

Citing 11
Cited 2

Image processing on compressed data for large video databases

MULTIMEDIA '93 Proceedings of the first ACM international conference on Multimedia
Automatic partitioning of full-motion video

Multimedia Systems
Automating the creation of a digital video library

Proceedings of the third ACM international conference on Multimedia
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Probabilistic Visual Learning for Object Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A visual search system for video and image databases

ICMCS '97 Proceedings of the 1997 International Conference on Multimedia Computing and Systems
Feature-Based Algorithms for Detecting and Classifying Scene Breaks

Feature-Based Algorithms for Detecting and Classifying Scene Breaks
Example Based Learning for View-Based Human Face Detection

Example Based Learning for View-Based Human Face Detection
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Improving Color Based Video Shot Detection

ICMCS '99 Proceedings of the 1999 IEEE International Conference on Multimedia Computing and Systems - Volume 02
Modeling the manifolds of images of handwritten digits

IEEE Transactions on Neural Networks

Challenges of Image and Video Retrieval

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Vlogging: A survey of videoblogging technology on the web

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, an online Bayesian formulation is presented to detect and describe the most significant key-frames and shot boundaries of a video sequence. Visual information is encoded in terms of a reduced number of degrees of freedom in order to provide robustness to noise, gradual transitions, flashes, camera motion and illumination changes. We present an online algorithm where images are classified according to their appearance contents -pixel values plus shape information- in order to obtain a structured representation from sequential information. This structured representation is presented on a grid where nodes correspond to the location of the representative image for each cluster. Since the estimation process takes simultaneously into account clustering and nodes' locations in the representation space, key-frames are placed considering visual similarities among neighbors. This fact not only provides a powerful tool for video navigation but also offers an organization for posterior higher-level analysis such as identifying pieces of news, interviews, etc.