Compact bag-of-words visual representation for effective linear classification

  • Authors:
  • Xiaodan Zhuang;Shuang Wu;Pradeep Natarajan

  • Affiliations:
  • Raytheon BBN Technologies, Cambridge, MA, USA;Raytheon BBN Technologies, Cambridge, MA, USA;Raytheon BBN Technologies, Cambridge, MA, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Multimedia
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bag-of-words approaches have been shown to achieve state-of-the-art performance in large-scale multimedia event detection. However, the commonly used histogram representation of bag-of-words requires large codebook sizes and expensive nonlinear kernel based classifiers for optimal performance. To address these two issues, we present a two-part generative model for compact visual representation, based on the i-vector approach recently proposed for speech and audio modeling. First, we use a Gaussian mixture model (GMM) to model the joint distribution of local descriptors. Second, we use a low-dimensional factor representation that constrains the GMM parameters to a subspace that preserves most of the information. We further extend this method to incorporate overlapping spatial regions, forming a highly compact visual representation that achieves superior performance with fast linear classifiers. We evaluate the method on a large video dataset used in the TRECVID 2011 MED evaluation. With linear classifiers, the proposed representation, with one-tenth of the storage footprint, outperforms soft quantization histograms used in the top performing TRECVID 2011 MED systems.