Recognizing objects and scenes in news videos

Authors:
Muhammet Baştan;Pınar Duygulu
Affiliations:
Department of Computer Engineering, Bilkent University, Ankara, Turkey;Department of Computer Engineering, Bilkent University, Ankara, Turkey
Venue:
CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Year:
2006

Citing 10
Cited 3

The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
A systematic comparison of various statistical alignment models

Computational Linguistics
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Matching words and pictures

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Hidden Markov models for automatic annotation and content-based retrieval of images and video

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling Scenes with Local Descriptors and Latent Aspects

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Media objects for user-centered similarity matching

Multimedia Tools and Applications
Using visual-textual mutual information and entropy for inter-modal document indexing

ECIR'07 Proceedings of the 29th European conference on IR research
A relational vector space model using an advanced weighting scheme for image retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach to recognize objects and scenes in news videos motivated by the availability of large video collections. This approach considers the recognition problem as the translation of visual elements to words. The correspondences between visual elements and words are learned using the methods adapted from statistical machine translation and used to predict words for particular image regions (region naming), for entire images (auto-annotation), or to associate the automatically generated speech transcript text with the correct video frames (video alignment). Experimental results are presented on TRECVID 2004 data set, which consists of about 150 hours of news videos associated with manual annotations and speech transcript text. The results show that the retrieval performance can be improved by associating visual and textual elements. Also, extensive analysis of features are provided and a method to combine features are proposed.