Visual cue cluster construction via information bottleneck principle and kernel density estimation

Authors:
Winston H. Hsu;Shih-Fu Chang
Affiliations:
Dept. of Electrical Engineering, Columbia University, New York, NY;Dept. of Electrical Engineering, Columbia University, New York, NY
Venue:
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Year:
2005

Citing 4
Cited 15

The nature of statistical learning theory

The nature of statistical learning theory
Self-Organizing Maps

Self-Organizing Maps
Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Applying the Information Bottleneck Principle to Unsupervised Clustering of Discrete and Continuous Image Representations

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2

Video search reranking via information bottleneck principle

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
The value of stories for speech-based video search

Proceedings of the 6th ACM international conference on Image and video retrieval
Columbia University's semantic video search engine

Proceedings of the 6th ACM international conference on Image and video retrieval
Video search re-ranking via multi-graph propagation

Proceedings of the 15th international conference on Multimedia
Visual islands: intuitive browsing of visual search results

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Concept-Specific Visual Vocabulary Construction for Object Categorization

PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Text-based video content classification for online video-sharing sites

Journal of the American Society for Information Science and Technology
Category sensitive codebook construction for object category recognition

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Using local density information to improve IB algorithms

Pattern Recognition Letters
Iterative sIB algorithm

Pattern Recognition Letters
VisionGo: Towards video retrieval with joint exploration of human and computer

Information Sciences: an International Journal
Adapted vocabularies for generic visual categorization

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part IV
Learning semantic features for action recognition via diffusion maps

Computer Vision and Image Understanding
Weakly supervised codebook learning by iterative label propagation with graph quantization

Signal Processing
Approximate nearest neighbor search to support manual image annotation of large domain-specific datasets

Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research in video analysis has shown a promising direction, in which mid-level features (e.g., people, anchor, indoor) are abstracted from low-level features (e.g., color, texture, motion, etc.) and used for discriminative classification of semantic labels. However, in most systems, such mid-level features are selected manually. In this paper, we propose an information-theoretic framework, visual cue cluster construction (VC3), to automatically discover adequate mid-level features. The problem is posed as mutual information maximization, through which optimal cue clusters are discovered to preserve the highest information about the semantic labels. We extend the Information Bottleneck framework to high-dimensional continuous features and further propose a projection method to map each video into probabilistic memberships over all the cue clusters. The biggest advantage of the proposed approach is to remove the dependence on the manual process in choosing the mid-level features and the huge labor cost involved in annotating the training corpus for training the detector of each mid-level feature. The proposed VC3 framework is general and effective, leading to exciting potential in solving other problems of semantic video analysis. When tested in news video story segmentation, the proposed approach achieves promising performance gain over representations derived from conventional clustering techniques and even the mid-level features selected manually.