A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Review: Speaker segmentation and clustering
Signal Processing
Interpreting TF-IDF term weights as making relevance decisions
ACM Transactions on Information Systems (TOIS)
Fast communication: Dominant sets clustering for image retrieval
Signal Processing
K-hyperline clustering learning for sparse component analysis
Signal Processing
Adapting the right measures for K-means clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Recover the tampered image based on VQ indexing
Signal Processing
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Cosine interesting pattern discovery
Information Sciences: an International Journal
Sparse transfer learning for interactive video search reranking
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Non-Negative Patch Alignment Framework
IEEE Transactions on Neural Networks
Hi-index | 0.08 |
Information-theoretic K-means (Info-Kmeans) aims to cluster high-dimensional data, such as images featured by the bag-of-features (BOF) model, using K-means algorithm with KL-divergence as the distance. While research efforts along this line have shown promising results, a remaining challenge is to deal with the high sparsity of image data. Indeed, the centroids may contain many zero-value features that create a dilemma in assigning objects to centroids during the iterative process of Info-Kmeans. To meet this challenge, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering in this paper. Specifically, SAIL can avoid the zero-feature dilemma by replacing the computation of KL-divergence between instances and centroids, by the computation of centroid entropies only. To further improve the clustering quality, we also introduce the Variable Neighborhood Search (VNS) meta-heuristic and propose the V-SAIL algorithm. Experimental results on various benchmark data sets clearly demonstrate the effectiveness of SAIL and V-SAIL. In particular, they help to successfully recognize nine out of 11 landmarks from extremely high-dimensional and sparse image vectors, with the presence of severe noise.