Movie segmentation into scenes and chapters using locally weighted bag of visual words

Authors:
Vasileios Chasanis;Argyris Kalogeratos;Aristidis Likas
Affiliations:
University of Ioannina, GR;University of Ioannina, GR;University of Ioannina, GR
Venue:
Proceedings of the ACM International Conference on Image and Video Retrieval
Year:
2009

Citing 8
Cited 6

Segmentation of video by clustering and graph analysis

Computer Vision and Image Understanding
Visual information retrieval

Visual information retrieval
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Contrast Context Histogram - A Discriminating Local Descriptor for Image Matching

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
The Locally Weighted Bag of Words Framework for Document Representation

The Journal of Machine Learning Research
Detection and representation of scenes in videos

IEEE Transactions on Multimedia
Video scene segmentation using Markov chain Monte Carlo

IEEE Transactions on Multimedia

A non parametric shot boundary detection: an eigen gap based approach

COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
Local histograms of character N-grams for authorship attribution

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
High level video temporal segmentation

ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part I
Video Segmentation and Structuring for Indexing Applications

International Journal of Multimedia Data Engineering & Management
Multimodal late fusion bag of features applied to scene detection

Proceedings of the 19th Brazilian symposium on Multimedia and the web
Video scene segmentation by improved visual shot coherence

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Movies segmentation into semantically correlated units is a quite tedious task due to "semantic gap". Low-level features do not provide useful information about the semantical correlation between shots and usually fail to detect scenes with constantly dynamic content. In the method we propose herein, local invariant descriptors are used to represent the key-frames of video shots and a visual vocabulary is created from these descriptors resulting to a visual words histogram representation (bag of visual words) for each shot. A key aspect of our method is that, based on an idea from text segmentation, the histograms of visual words corresponding to each shot are further smoothed temporally by taking into account the histograms of neighboring shots. In this way, valuable contextual information is preserved. The final scene and chapter boundaries are determined at the local maxima of the difference of successive smoothed histograms for low and high values of the smoothing parameter respectively. Numerical experiments indicate that our method provides high detection rates while preserving a good tradeoff between recall and precision.