Segmentation of video by clustering and graph analysis
Computer Vision and Image Understanding
Visual information retrieval
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Contrast Context Histogram - A Discriminating Local Descriptor for Image Matching
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
The Locally Weighted Bag of Words Framework for Document Representation
The Journal of Machine Learning Research
Detection and representation of scenes in videos
IEEE Transactions on Multimedia
Video scene segmentation using Markov chain Monte Carlo
IEEE Transactions on Multimedia
A non parametric shot boundary detection: an eigen gap based approach
COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
High level video temporal segmentation
ISVC'11 Proceedings of the 7th international conference on Advances in visual computing - Volume Part I
Video Segmentation and Structuring for Indexing Applications
International Journal of Multimedia Data Engineering & Management
Multimodal late fusion bag of features applied to scene detection
Proceedings of the 19th Brazilian symposium on Multimedia and the web
Video scene segmentation by improved visual shot coherence
Proceedings of the 19th Brazilian symposium on Multimedia and the web
Hi-index | 0.00 |
Movies segmentation into semantically correlated units is a quite tedious task due to "semantic gap". Low-level features do not provide useful information about the semantical correlation between shots and usually fail to detect scenes with constantly dynamic content. In the method we propose herein, local invariant descriptors are used to represent the key-frames of video shots and a visual vocabulary is created from these descriptors resulting to a visual words histogram representation (bag of visual words) for each shot. A key aspect of our method is that, based on an idea from text segmentation, the histograms of visual words corresponding to each shot are further smoothed temporally by taking into account the histograms of neighboring shots. In this way, valuable contextual information is preserved. The final scene and chapter boundaries are determined at the local maxima of the difference of successive smoothed histograms for low and high values of the smoothing parameter respectively. Numerical experiments indicate that our method provides high detection rates while preserving a good tradeoff between recall and precision.