Visual Concepts for News Story Tracking: Analyzing and Exploiting the NIST TRECVID Video Annotation Experiment

Authors:
John R. Kender;Milind R. Naphade
Affiliations:
Columbia University;IBM Watson Research Center
Venue:
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Year:
2005

Citing 0
Cited 10

Tracking news stories across different sources

Proceedings of the 13th annual ACM international conference on Multimedia
An empirical study of inter-concept similarities in multimedia ontologies

Proceedings of the 6th ACM international conference on Image and video retrieval
How many high-level concepts will fill the semantic gap in news video retrieval?

Proceedings of the 6th ACM international conference on Image and video retrieval
Dynamic pictorial ontologies for video digital libraries annotation

Workshop on multimedia information retrieval on The many faces of multimedia semantics
Web video topic discovery and tracking via bipartite graph reinforcement model

Proceedings of the 17th international conference on World Wide Web
Real-time new event detection for video streams

Proceedings of the 17th ACM conference on Information and knowledge management
Fusing semantics, observability, reliability and diversity of concept detectors for video search

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Concept detectors: how good is good enough?

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Multimedia ontology based computational framework for video annotation and retrieval

MCAM'07 Proceedings of the 2007 international conference on Multimedia content analysis and mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the summer of 2003, using an interactive intelligent tool, over 100 researchers in video understanding annotated from the NIST TRECVID database over 62 hours of news video spanning six months of 1998. These 47K shots with 433K labels from over 1000 visual concept categories comprise the largest publicly available ground truth for this domain. Our analysis of this data, combining the tools of statistical natural language processing, machine learning, and computer vision, finds significant novel statistical patterns that can be exploited for the accurate tracking of the episodes of a given news story over time, by using semantic labels that are solely visual. We find that the ground "truth" is very muddy, but by using the feature selection tool of information gain, we extract 14 reliable visual concepts with mid-frequency use; all but one are visual concepts that refer to settings, rather than actors, objects, or events. We discover that the probability of another episode of a named story to recur after a gap of d days is proportional to 1/(d + 1). Wedefine a novel similarity measure incorporating both semantic and temporal properties between episodes i and j as: Dice(i, j)/(1+gap(i, j)). We exploit a low-level computer vision technique, normalized cut (Laplacian eigenmaps), for clustering these episodes into stories, and in the process document a weakness of this popular technique. We use these empirical results to make specific recommendations on how better visual semantic ontologies for news stories, and how better video annotation tools, should be designed.