Towards information-theoretic K-means clustering for image indexing

Authors:
Jie Cao;Zhiang Wu;Junjie Wu;Wenjie Liu
Affiliations:
Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, China;Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing, China;Department of Information Systems, School of Economics and Management, Beihang University, Beijing, China;Department of Computer Science & Technology, Nanjing University, Nanjing, China
Venue:
Signal Processing
Year:
2013

Citing 15
Cited 0

A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Generative model-based document clustering: a comparative study

Knowledge and Information Systems
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Review: Speaker segmentation and clustering

Signal Processing
Interpreting TF-IDF term weights as making relevance decisions

ACM Transactions on Information Systems (TOIS)
Fast communication: Dominant sets clustering for image retrieval

Signal Processing
K-hyperline clustering learning for sparse component analysis

Signal Processing
Adapting the right measures for K-means clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Recover the tampered image based on VQ indexing

Signal Processing
Multiview spectral embedding

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Cosine interesting pattern discovery

Information Sciences: an International Journal
Sparse transfer learning for interactive video search reranking

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Non-Negative Patch Alignment Framework

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.08

Visualization

Abstract

Information-theoretic K-means (Info-Kmeans) aims to cluster high-dimensional data, such as images featured by the bag-of-features (BOF) model, using K-means algorithm with KL-divergence as the distance. While research efforts along this line have shown promising results, a remaining challenge is to deal with the high sparsity of image data. Indeed, the centroids may contain many zero-value features that create a dilemma in assigning objects to centroids during the iterative process of Info-Kmeans. To meet this challenge, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering in this paper. Specifically, SAIL can avoid the zero-feature dilemma by replacing the computation of KL-divergence between instances and centroids, by the computation of centroid entropies only. To further improve the clustering quality, we also introduce the Variable Neighborhood Search (VNS) meta-heuristic and propose the V-SAIL algorithm. Experimental results on various benchmark data sets clearly demonstrate the effectiveness of SAIL and V-SAIL. In particular, they help to successfully recognize nine out of 11 landmarks from extremely high-dimensional and sparse image vectors, with the presence of severe noise.