Submodular video hashing: a unified framework towards video pooling and indexing

Authors:
Liangliang Cao;Zhenguo Li;Yadong Mu;Shih-Fu Chang
Affiliations:
IBM Watson Research Center, Hawthorne, NY, USA;Columbia University, New York City, NY, USA;Columbia University, New York City, NY, USA;Columbia University, New York City, NY, USA
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 22
Cited 1

Matrix analysis

Matrix analysis
Inverted files versus signature files for text indexing

ACM Transactions on Database Systems (TODS)
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Pose Estimation with Parameter-Sensitive Hashing

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
An efficient parts-based near-duplicate and sub-image retrieval system

Proceedings of the 12th annual ACM international conference on Multimedia
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Cost-effective outbreak detection in networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Practical elimination of near-duplicates from web video search

Proceedings of the 15th international conference on Multimedia
Bounded coordinate system indexing for real-time video clip search

ACM Transactions on Information Systems (TOIS)
Scalable similarity search with optimized kernel hashing

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Consumer video understanding: a benchmark database and an evaluation of human and machine performance

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Composite hashing with multiple information sources

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Correlation-based retrieval for heavily changed near-duplicate videos

ACM Transactions on Information Systems (TOIS)
Multiple feature hashing for real-time large scale near-duplicate video retrieval

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Semantic Model Vectors for Complex Video Event Recognition

IEEE Transactions on Multimedia
Distributed cosegmentation via submodular optimization on anisotropic diffusion

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Scene aligned pooling for complex video recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II

Order preserving hashing for approximate nearest neighbor search

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we employ compact binary codes to facilitate nearest neighbor search. Unlike the previous methods which capitalize on only one type of hash code for retrieval, this paper combines heterogeneous hash codes to effectively describe the diverse and multi-scale visual contents in videos. Our method integrates feature pooling and hashing in a single framework. In the pooling stage, we cast video frames into a set of pre-specified components, which capture a variety of semantics of video contents. In the hashing stage, we represent each video component as a compact hash code, and combine multiple hash codes into hash tables for effective search. To speed up the retrieval while retaining most informative codes, we propose a graph-based influence maximization method to bridge the pooling and hashing stages. We show that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution. Our method works very efficiently, retrieving thousands of video clips from TRECVID dataset in about 0.001 second. For a larger scale synthetic dataset with 1M samples, it uses less than 1 second in response to 100 queries. Our method is extensively evaluated in both unsupervised and supervised scenarios, and the results on TRECVID Multimedia Event Detection and Columbia Consumer Video datasets demonstrate the success of our proposed technique.