Building descriptive and discriminative visual codebook for large-scale image applications

  • Authors:
  • Qi Tian;Shiliang Zhang;Wengang Zhou;Rongrong Ji;Bingbing Ni;Nicu Sebe

  • Affiliations:
  • Computer Science Department, University of Texas at San Antonio, San Antonio, USA 78249;Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100190;EEIS Department, University of Science and Technology of China, Heifei, China 230027;Harbin Institute of Technology, Harbin, China 150001;National University of Singapore, Singapore, Singapore 117576;Department of Information Engineering and Computer Science, University of Trento, Trento, Italy

  • Venue:
  • Multimedia Tools and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inspired by the success of textual words in large-scale textual information processing, researchers are trying to extract visual words from images which function similar as textual words. Visual words are commonly generated by clustering a large amount of image local features and the cluster centers are taken as visual words. This approach is simple and scalable, but results in noisy visual words. Lots of works are reported trying to improve the descriptive and discriminative ability of visual words. This paper gives a comprehensive survey on visual vocabulary and details several state-of-the-art algorithms. A comprehensive review and summarization of the related works on visual vocabulary is first presented. Then, we introduce our recent algorithms on descriptive and discriminative visual word generation, i.e., latent visual context analysis for descriptive visual word identification [74], descriptive visual words and visual phrases generation [68], contextual visual vocabulary which combines both semantic contexts and spatial contexts [69], and visual vocabulary hierarchy optimization [18]. Additionally, we introduce two interesting post processing strategies to further improve the performance of visual vocabulary, i.e., spatial coding [73] is proposed to efficiently remove the mismatched visual words between images for more reasonable image similarity computation; user preference based visual word weighting [44] is developed to make the image similarity computed based on visual words more consistent with users' preferences or habits.