Large-scale visual concept detection with explicit kernel maps and power mean SVM

  • Authors:
  • Mats Sjöberg;Markus Koskela;Satoru Ishikawa;Jorma Laaksonen

  • Affiliations:
  • Aalto University School of Science, Espoo, Finland;Aalto University School of Science, Espoo, Finland;Aalto University School of Science, Espoo, Finland;Aalto University School of Science, Espoo, Finland

  • Venue:
  • Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many emerging application areas in video and image processing require large-scale visual concept detection. Examples include content-based indexing of online user-generated videos and 24/7 archival of TV broadcasts. The current state of the art in concept detection uses bag-of-visual-words features with computationally heavy exponential kernel classifiers. We argue that this classifier approach is not feasible for large-scale real-time applications, and propose instead to use combinations of approximate additive kernel classifiers. By using explicit kernel maps and the power mean SVM, followed by fusion of classifiers trained on different features, we achieve high retrieval precision while retaining real-time performance for large sets of concepts. This paper presents a series of experiments with the large-scale TRECVID 2012 video database and the commonly used Fifteen Scene Categories image database. We show significantly improved retrieval performance over standard linear classifiers, and by late fusion over several visual features, the approximative additive kernels outperform any single exponential kernel in only a fraction of the detection time.