Compact representation for large-scale clustering and similarity search

  • Authors:
  • Bin Wang;Yuanhao Chen;Zhiwei Li;Mingjing Li

  • Affiliations:
  • University of Science and Technology of China;University of Science and Technology of China;Microsoft Research Asia;Microsoft Research Asia

  • Venue:
  • PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although content-based image retrieval has been researched for many years, few content-based methods are implemented in present image search engines. This is partly bacause of the great difficulty in indexing and searching in high-dimensional feature space for large-scale image datasets. In this paper, we propose a novel method to represent the content of each image as one or multiple hash codes, which can be considered as special keywords. Based on this compact representation, images can be accessed very quickly by their visual content. Furthermore, two advanced functionalities are implemented. One is content-based image clustering, which is simplified as grouping images with identical or near identical hash codes. The other is content-based similarity search, which is approximated by finding images with similar hash codes. The hash code extraction process is very simple, and both image clustering and similarity search can be performed in real time. Experiments on over 11 million images collected from the web demonstrate the efficiency and effectiveness of the proposed method.