Creating a web-scale video collection for research

Authors:
Paul Over;George Awad;Alan F. Smeaton;Colum Foley;James Lanagan
Affiliations:
National Institute of Standards and Technology, Gaithersburg, Ireland;National Institute of Standards and Technology, Gaithersburg, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland
Venue:
WSMC '09 Proceedings of the 1st workshop on Web-scale multimedia corpus
Year:
2009

Citing 2
Cited 0

Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
(Un)Reliability of video concept detection

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper begins by considering a number of important design questions for a large-scale, widely available, multimedia test collection intended to support long-term scientific evaluation and comparison of content-based video analysis and exploitation systems. While the collection presented here is not quite web-scale, it is to our knowledge the largest video collection created to date. It is therefore of use in expanding the scale of any evaluation of multimedia collections and systems. Such exploitation systems would include the kinds of functionality already explored within the annual TREC Video Retrieval Evaluation (TRECVid) benchmarking activity such as search, semantic concept detection, and automatic summarization. We then report on our progress in creating such a multimedia collection from publicly available Internet Archive videos with Creative Commons licenses (IACC.1), which we hope will be a useful approximation of a web-scale collection and will support a next generation of benchmarking activities for content-based video operations. We also report on some possibilities for putting this collection to use in multimedia system evaluation. It is the intended that this collection be partitioned and used within the TRECVid 2010 evaluations, and in subsequent years to that.