Relevance filtering meets active learning: improving web-based concept detectors

  • Authors:
  • Damian Borth;Adrian Ulges;Thomas M. Breuel

  • Affiliations:
  • University of Kaiserslautern, Kaiserslautern, Germany;German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany;University of Kaiserslautern, Kaiserslautern, Germany

  • Venue:
  • Proceedings of the international conference on Multimedia information retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the challenge of training visual concept detectors on web video as available from portals such as YouTube. In contrast to high-quality but small manually acquired training sets, this setup permits us to scale up concept detection to very large training sets and concept vocabularies. On the downside, web tags are only weak indicators of concept presence, and web video training data contains lots of non-relevant content. So far, there are two general strategies to overcome this label noise problem, both targeted at discarding non-relevant training content: (1) a manual refinement supported by active learning sample selection, (2) an automatic refinement using relevance filtering. In this paper, we present a highly efficient approach combining these two strategies in an interleaved setup: manually refined samples are directly used to improve relevance filtering, which again provides a good basis for the next active learning sample selection. Our results demonstrate that the proposed combination -- called active relevance filtering -- outperforms both a purely automatic filtering and a manual one based on active learning. For example, by using 50 manual labels per concept, an improvement of 5% over an automatic filtering is achieved, and 6% over active learning. By annotating only 25% of weak positive samples in the training set, a performance comparable to training on ground truth labels is reached.