A discrete direct retrieval model for image and video retrieval

  • Authors:
  • Shaolei Feng;R. Manmatha

  • Affiliations:
  • Siemens Corporate Research, Inc., Princeton, NJ, USA;University of Massachusetts, Amherst, MA, USA

  • Venue:
  • CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a formal framework for image and video retrieval using discrete Markov random fields (MRF). The training dataset consists of images with keywords (regions are not labeled). The model is built using a discrete vocabulary of vector quantized region or point features generated from the training images. Since performance is dependent on the size of the vocabulary, a large vocabulary of a couple of million visterms is used. Such large vocabularies cannot be generated by conventional clustering algorithms so hierarchical k-means is used to generate it. Unlike many previous techniques, our MRF based model doesn't require an explicit annotation step for retrieval. The model directly ranks all test images according to the posterior probability of an image given a query. Traditionally, most models are trained by maximizing likelihood - instead this model is trained by maximizing average precision. Image and video retrieval experiments are performed on two standard datasets (a Corel dataset and a TRECVID3 dataset) which consist of 4,500 images and about 44,100 keyframes respectively. The results show that based on a large visual vocabulary the model runs extremely fast on even very large datasets while having comparable retrieval performance to the best performing (continuous feature) models.