Bag of visual words model for videos segmentation into scenes

Authors:
Junaid Baber;Shin'ichi Satoh;Nitin Afzulpurkar;Chadaporn Keatmanee
Affiliations:
Asian Institute of Technology, Thailand;National Institute of Informatics, Japan;Asian Institute of Technology, Thailand;Sripatum University, Thailand
Venue:
Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Year:
2013

Citing 8
Cited 0

Segmentation of video by clustering and graph analysis

Computer Vision and Image Understanding
Motion-Based Video Representation for Scene Change Detection

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Video scene segmentation and semantic representation using a novel scheme

Multimedia Tools and Applications
A heuristic algorithm for video scene detection using shot cluster sequence analysis

Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Automated high-level movie segmentation for advanced video-retrieval systems

IEEE Transactions on Circuits and Systems for Video Technology
Video summarization and scene detection by graph modeling

IEEE Transactions on Circuits and Systems for Video Technology
Large vocabulary quantization for searching instances from videos

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advancement in multimedia technologies, the video databases are exponentially increasing in size which are creating many challenges for efficient indexing and retrieval of the videos. Interactive and efficient search engines allow users to query some part of the videos or search some particular scenes in the video databases. Segmentation and retrieval of videos scenes are getting popular which make video indexing more flexible and efficient. In this paper, we automatically segment the videos into scenes (visually related video shots). We first segment the videos into shots and then merge the shots which are visually similar. We represent shots by bag of visual words (BoVW) model and compute the similarity of shots with each other within the sliding window of length L, sliding window makes similarity computation efficient as the similarity of each shot is computed with its L neighbors only, instead of whole pool of shots. Experiments on cinematic videos and dramas show the effectiveness of proposed technique.