The LIMSI Broadcast News transcription system
Speech Communication - Special issue on automatic transcription of broadcast news data
Interactive Maps for a Digital Video Library
IEEE MultiMedia
Video OCR: indexing digital new libraries by recognition of superimposed captions
Multimedia Systems - Special section on video libraries
Story Segmentation and Detection of Commercials in Broadcast News Video
ADL '98 Proceedings of the Advances in Digital Libraries Conference
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Registration of Video to Geo-Referenced Imagery
ICPR '98 Proceedings of the 14th International Conference on Pattern Recognition-Volume 2 - Volume 2
Naming every individual in news video monologues
Proceedings of the 12th annual ACM international conference on Multimedia
3WNews: who, where, and when in news video
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
The evolution of visual information retrieval
Journal of Information Science
Semantic entity-relationship model for large-scale multimedia news exploration and recommendation
MMM'10 Proceedings of the 16th international conference on Advances in Multimedia Modeling
Hi-index | 0.00 |
The location of video scenes is an important semantic descriptor especially for broadcast news video. In this paper, we propose a learning-based approach to annotate shots of news video with locations extracted from video transcript, based on features from multiple video modalities including syntactic structure of transcript sentences, speaker identity, temporal video structure, and so on. Machine learning algorithms are adopted to combine multi-modal features to solve two sub-problems: (1) whether the location of a video shot is mentioned in the transcript, and if so, (2) among many locations in the transcript, which are correct one(s) for this shot. Experiments on TRECVID dataset demonstrate that our approach achieves approximately 85% accuracy in correctly labeling the location of any shot in news video.