Fisher kernel based relevance feedback for multimodal video retrieval

  • Authors:
  • Ionut Mironica;Bogdan Ionescu;Jasper Uijlings;Nicu Sebe

  • Affiliations:
  • LAPI, University Politehnica of Bucharest, Romania., Bucharest, Romania;LAPI, University Politehnica of Bucharest, Romania., Bucharest, Romania;DISI, University of Trento, Italy., Trento, Romania;DISI, University of Trento, Italy., Trento, Romania

  • Venue:
  • Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. In the context of relevance feedback, instead of learning the generative probability distribution over all features of the data, we learn it only over the top retrieved results. Hence during relevance feedback we create a new Fisher Kernel representation based on the most relevant examples. In addition, we propose to use the Fisher Kernel to capture temporal information by cutting up a video in smaller segments, extract a feature vector from each segment, and represent the resulting feature set using the Fisher Kernel representation. We evaluate our method on the MediaEval 2012 Video Genre Tagging Task, a large dataset, which contains 26 categories in 15.000 videos totalling up to 2.000 hours of footage. Results show that our method significantly improves results over existing state-of-the-art relevance feedback techniques. Furthermore, we show significant improvements by using the Fisher Kernel to capture temporal information, and we demonstrate that Fisher kernels are well suited for this task.