Retina enhanced SURF descriptors for spatio-temporal concept detection

  • Authors:
  • Sabin Tiberius Strat;Alexandre Benoit;Patrick Lambert;Alice Caplier

  • Affiliations:
  • LISTIC - Université de Savoie, Annecy Le Vieux, France and LAPI - University "Politechnica" of Bucharest, Bucharest, Romania;LISTIC - Université de Savoie, Annecy Le Vieux, France;LISTIC - Université de Savoie, Annecy Le Vieux, France;Gipsa-Lab - Université de Grenoble, St Martin d'Hères, France

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes to investigate the potential benefit of the use of low-level human vision behaviors in the context of high-level semantic concept detection. A large part of the current approaches relies on the Bag-of-Words (BoW) model, which has proven itself to be a good choice especially for object recognition in images. Its extension from static images to video sequences exhibits some new problems to cope with, mainly the way to use the temporal information related to the concepts to detect (swimming, drinking...). In this study, we propose to apply a human retina model to preprocess video sequences before constructing the State-Of-The-Art BoW analysis. This preprocessing, designed in a way that enhances relevant information, increases the performance by introducing robustness to traditional image and video problems, such as luminance variation, shadows, compression artifacts and noise. Additionally, we propose a new segmentation method which enables a selection of low-level spatio-temporal potential areas of interest from the visual scene, without slowing the computation as much as a high-level saliency model would. These approaches are evaluated on the TrecVid 2010 and 2011 Semantic Indexing Task datasets, containing from 130 to 346 high-level semantic concepts. We also experiment with various parameter settings to check their effect on performance.