Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study

Authors:
Gaurav Srivastava;Josiah A. Yoder;Johnny Park;Avinash C. Kak
Affiliations:
-;-;-;-
Venue:
Computer Vision and Image Understanding
Year:
2013

Citing 32
Cited 0

Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A Trainable System for Object Detection

International Journal of Computer Vision - special issue on learning and vision at the center for biological and computational learning, Massachusetts Institute of Technology
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Content-boosted collaborative filtering for improved recommendations

Eighteenth national conference on Artificial intelligence
Efficient matching and clustering of video shots

ICIP '95 Proceedings of the 1995 International Conference on Image Processing (Vol. 1)-Volume 1 - Volume 1
Mean Shift Based Clustering in High Dimensions: A Texture Classification Example

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
The CMU Pose, Illumination, and Expression Database

IEEE Transactions on Pattern Analysis and Machine Intelligence
Labeling images with a computer game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Semi-Supervised Self-Training of Object Detection Models

WACV-MOTION '05 Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Label propagation through linear neighborhoods

ICML '06 Proceedings of the 23rd international conference on Machine learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Video abstraction: A systematic review and classification

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Actions as Space-Time Shapes

IEEE Transactions on Pattern Analysis and Machine Intelligence
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Veritas: Combining Expert Opinions without Labeled Data

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 01
Matchin: eliciting user preferences with an online game

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Unsupervised and semi-supervised multi-class support vector machines

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Inferring semantic concepts from community-contributed images and noisy tags

MM '09 Proceedings of the 17th ACM international conference on Multimedia
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Learning From Crowds

The Journal of Machine Learning Research
Regression Learning with Multiple Noisy Oracles

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Efficient large-scale image annotation by probabilistic collaborative multi-label propagation

Proceedings of the international conference on Multimedia
Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Large-scale live active learning: Training object detectors with crawled data and crowds

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
A novel video key-frame-extraction algorithm based on perceived motion energy model

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of predicting category labels for unlabeled videos in a large video dataset by using a ground-truth set of objectively labeled videos that we have created. Large video databases like YouTube require that a user uploading a new video assign to it a category label from a prescribed set of labels. Such category labeling is likely to be corrupted by the subjective biases of the uploader. Despite their noisy nature, these subjective labels are frequently used as gold standard in algorithms for multimedia classification and retrieval. Our goal in this paper is NOT to propose yet another algorithm that predicts labels for unseen videos based on the subjective ground-truth. On the other hand, our goal is to demonstrate that the video classification performance can be improved if instead of using subjective labels, we first create an objectively labeled ground-truth set of videos and then train a classifier based on such a ground-truth so as to predict objective labels for the set of unlabeled videos. With regard to how we generate the objectively-labeled ground-truth dataset, we base it on the notion that when a video is labeled by a panel of diverse individuals, the majority opinion rendered by the panel may be taken to be the objective opinion. In this manner, using judgments provided by multiple human annotators, we have collected objective labels for a ground-truth dataset consisting of randomly-selected 1000 videos from the TinyVideos database that contains roughly 52,000 videos from YouTube (courtesy of Karpenko and Aarabi [1]). Through a fourfold cross-validation experiment on the ground-truth set, we demonstrate that the objective labels have a superior consistency compared to the subjective labels when used for video classification. We show that this claim is valid for several different kinds of feature sets that one can use to compare videos and with two different types of classifiers that one can use for label prediction. Subsequently, we use the ground-truth dataset of 1000 videos to predict the objective category labels of the remaining 51,000 videos. We compare the objective labels thus determined with the subjective labels provided by the video uploaders and qualitatively argue for the more informative nature of the objective labels.