Towards information-theoretic K-means clustering for image indexing

  • Authors:
  • Jie Cao;Zhiang Wu;Junjie Wu;Wenjie Liu

  • Affiliations:
  • Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, China;Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, China;Department of Information Systems, School of Economics and Management, Beihang University, Beijing, China;Department of Computer Science & Technology, Nanjing University, Nanjing, China

  • Venue:
  • Signal Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.08

Visualization

Abstract

Information-theoretic K-means (Info-Kmeans) aims to cluster high-dimensional data, such as images featured by the bag-of-features (BOF) model, using K-means algorithm with KL-divergence as the distance. While research efforts along this line have shown promising results, a remaining challenge is to deal with the high sparsity of image data. Indeed, the centroids may contain many zero-value features that create a dilemma in assigning objects to centroids during the iterative process of Info-Kmeans. To meet this challenge, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering in this paper. Specifically, SAIL can avoid the zero-feature dilemma by replacing the computation of KL-divergence between instances and centroids, by the computation of centroid entropies only. To further improve the clustering quality, we also introduce the Variable Neighborhood Search (VNS) meta-heuristic and propose the V-SAIL algorithm. Experimental results on various benchmark data sets clearly demonstrate the effectiveness of SAIL and V-SAIL. In particular, they help to successfully recognize nine out of 11 landmarks from extremely high-dimensional and sparse image vectors, with the presence of severe noise.