Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study

  • Authors:
  • Gaurav Srivastava;Josiah A. Yoder;Johnny Park;Avinash C. Kak

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Computer Vision and Image Understanding
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of predicting category labels for unlabeled videos in a large video dataset by using a ground-truth set of objectively labeled videos that we have created. Large video databases like YouTube require that a user uploading a new video assign to it a category label from a prescribed set of labels. Such category labeling is likely to be corrupted by the subjective biases of the uploader. Despite their noisy nature, these subjective labels are frequently used as gold standard in algorithms for multimedia classification and retrieval. Our goal in this paper is NOT to propose yet another algorithm that predicts labels for unseen videos based on the subjective ground-truth. On the other hand, our goal is to demonstrate that the video classification performance can be improved if instead of using subjective labels, we first create an objectively labeled ground-truth set of videos and then train a classifier based on such a ground-truth so as to predict objective labels for the set of unlabeled videos. With regard to how we generate the objectively-labeled ground-truth dataset, we base it on the notion that when a video is labeled by a panel of diverse individuals, the majority opinion rendered by the panel may be taken to be the objective opinion. In this manner, using judgments provided by multiple human annotators, we have collected objective labels for a ground-truth dataset consisting of randomly-selected 1000 videos from the TinyVideos database that contains roughly 52,000 videos from YouTube (courtesy of Karpenko and Aarabi [1]). Through a fourfold cross-validation experiment on the ground-truth set, we demonstrate that the objective labels have a superior consistency compared to the subjective labels when used for video classification. We show that this claim is valid for several different kinds of feature sets that one can use to compare videos and with two different types of classifiers that one can use for label prediction. Subsequently, we use the ground-truth dataset of 1000 videos to predict the objective category labels of the remaining 51,000 videos. We compare the objective labels thus determined with the subjective labels provided by the video uploaders and qualitatively argue for the more informative nature of the objective labels.