Topic models for semantics-preserving video compression

Authors:
Jörn Wanke;Adrian Ulges;Christoph H. Lampert;Thomas M. Breuel
Affiliations:
University of Kaiserslautern, Kaiserslautern, Germany;German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany;Max Planck Institute for Biological Cybernetics, Tübingen, Germany;University of Kaiserslautern, Kaiserslautern, Germany
Venue:
Proceedings of the international conference on Multimedia information retrieval
Year:
2010

Citing 20
Cited 1

Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Latent semantic analysis for an effective region-based video shot retrieval system

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
PLSA-based image auto-annotation: constraining the latent space

Proceedings of the 12th annual ACM international conference on Multimedia
Optimal multimodal fusion for multimedia data analysis

Proceedings of the 12th annual ACM international conference on Multimedia
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Image retrieval on large-scale image databases

Proceedings of the 6th ACM international conference on Image and video retrieval
Online video recommendation based on multimodal fusion and relevance feedback

Proceedings of the 6th ACM international conference on Image and video retrieval
Information-theoretic semantic multimedia indexing

Proceedings of the 6th ACM international conference on Image and video retrieval
Speeded-Up Robust Features (SURF)

Computer Vision and Image Understanding
A comparison of color features for visual concept classification

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Language modeling for bag-of-visual words image categorization

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Continuous visual vocabulary modelsfor pLSA-based scene recognition

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Deep networks for image retrieval on large-scale databases

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
A Thousand Words in a Scene

IEEE Transactions on Pattern Analysis and Machine Intelligence

Translating related words to videos and back through latent topics

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most state-of-the-art systems for content-based video understanding tasks require video content to be represented as collections of many low-level descriptors, e.g. as histograms of the color, texture or motion in local image regions. In order to preserve as much of the information contained in the original video as possible, these representations are typically high-dimensional, which conflicts with the aim for compact descriptors that would allow better efficiency and lower storage requirements. In this paper, we address the problem of semantic compression of video, i.e. the reduction of low-level descriptors to a small number of dimensions while preserving most of the semantic information. For this, we adapt topic models - which have previously been used as compact representations of still images - to take into account the temporal structure of a video, as well as multi-modal components such as motion information. Experiments on a large-scale collection of YouTube videos show that we can achieve a compression ratio of 20 : 1 compared to ordinary histogram representations and at least 2 : 1 compared to other dimensionality reduction techniques without significant loss of prediction accuracy. Also, improvements are demonstrated for our video-specific extensions modeling temporal structure and multiple modalities.