A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
Image Indexing Using Color Correlograms
CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Robust Real-Time Face Detection
International Journal of Computer Vision
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
A unified shot boundary detection framework based on graph partition model
Proceedings of the 13th annual ACM international conference on Multimedia
A Visual Attention Based Region-of-Interest Determination Framework for Video Sequences*
IEICE - Transactions on Information and Systems
Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence
VisualCor system: search actor correlations in TV series
Proceedings of the First International Conference on Internet Multimedia Computing and Service
Hi-index | 0.00 |
Places in movies and sitcoms could indicate higher-level semantic cues about the story scenarios and actor relations. This paper presents a novel unsupervised framework for efficient place retrieval in movies and sitcoms. We leverage face detection to filter out close-up frames from video dataset, and adopt saliency map analysis to partition background places from foreground actions. Consequently, we extract pyramid-based spatial-encoding correlogram from shot key frames for robust place representation. For effectively describing variant place appearances, we cluster key frames and model inter-cluster belonging of identical place by inside-shot association. Then hierarchical normalized cut is utilized over the association graph to differentiate physical places within videos and gain their multi-view representation as a tree structure. For efficient place matching in large-scale database, inversed indexing is applied onto the hierarchical graph structure, based on which approximate nearest neighbor search is proposed to largely accelerate search process. Experimental results on over 36-hour Friends sitcom database demonstrate the effectiveness, efficiency, and semantic revealing ability of our framework.