Motion Activity Based Shot Identification and Closed Caption Detection for Video Structuring

Authors:
Duan-Yu Chen;Shu-Jiuan Lin;Suh-Yin Lee
Affiliations:
-;-;-
Venue:
VISUAL '02 Proceedings of the 5th International Conference on Recent Advances in Visual Information Systems
Year:
2002

Citing 6
Cited 1

Automatic Caption Localization in Compressed Video

IEEE Transactions on Pattern Analysis and Machine Intelligence
On face detection in the compressed domain

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Detection of text captions in compressed domain video

MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
MPEG Video Compression Standard

MPEG Video Compression Standard
Fast scene change detection using direct feature extraction fromMPEG compressed videos

IEEE Transactions on Multimedia
A highly efficient system for automatic face region detection in MPEG video

IEEE Transactions on Circuits and Systems for Video Technology

Motion Activity Based Semantic Video Similarity Retrieval

PCM '02 Proceedings of the Third IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel approach to generate the table of video content based on shot description by motion activity and closed caption in MPEG-2 video streams. Videos are segmented into shots by GOP-based approach and shot identification is used to identify segmented shots. The specific shots of interest are selected and the proposed approach of closed caption detection is used to detect captions in these shots. In order to speed up in scene change detection, instead of examining scene cut frame by frame, GOP-based approach first checks video streams GOP by GOP and then finds out the actual scene boundaries in the frame level. The segmented shots containing closed caption are identified by the proposed object-based motion activity descriptor. The algorithm of SOM (Self-Organization Map) is used to filter out noise in the process of caption localization. While captions are localized in the recognized shots, we create the table of video content based on the hierarchical structure of story unit, consecutive shots and captioned frames. The experimental results show the effectiveness of the proposed approach and reveal the feasibility of the hierarchical structuring of video content.