Natural language descriptions of visual scenes: corpus generation and analysis

Authors:
Muhammad Usman Ghani Khan;Rao Muhammad Adeel Nawab;Yoshihiko Gotoh
Affiliations:
University of Sheffield, United Kingdom;University of Sheffield, United Kingdom;University of Sheffield, United Kingdom
Venue:
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Year:
2012

Citing 9
Cited 0

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Foundations of statistical natural language processing

Foundations of statistical natural language processing
The kappa statistic: a second look

Computational Linguistics
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
PETS Metrics: On-Line Performance Evaluation Service

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
The trecvid 2007 BBC rushes summarization evaluation pilot

Proceedings of the international workshop on TRECVID video summarization
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Collecting image annotations using Amazon's Mechanical Turk

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk

Quantified Score

Hi-index	0.00

Visualization

Abstract

As video contents continue to expand, it is increasingly important to properly annotate videos for effective search, mining and retrieval purposes. While the idea of annotating images with keywords is relatively well explored, work is still needed for annotating videos with natural language to improve the quality of video search. The focus of this work is to present a video dataset with natural language descriptions which is a step ahead of keywords based tagging. We describe our initial experiences with a corpus consisting of descriptions for video segments crafted from TREC video data. Analysis of the descriptions created by 13 annotators presents insights into humans' interests and thoughts on videos. Such resource can also be used to evaluate automatic natural language generation systems for video.