Algorithms for clustering data
Algorithms for clustering data
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Learning from Examples with Information Theoretic Criteria
Journal of VLSI Signal Processing Systems
Mean shift: An information theoretic perspective
Pattern Recognition Letters
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Clustering using elements of information theory
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Crame´r-Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information
IEEE Transactions on Information Theory
Comparative study on information theoretic clustering and classical clustering algorithms
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Hi-index | 12.05 |
This paper proposes a new perspective on non-parametric entropy-based clustering. We developed a new cost evaluation function for clustering that measures the cross information potential (CIP) between clusters on a dataset using representative points, which we called representative CIP (rCIP). We did this based on the idea that optimizing the cross information potential is equivalent to minimizing cross entropy between clusters. Our measure is different because, instead of using all points in a dataset, it uses only representative points to quantify the interaction between distributions without any loss of the original properties of cross information potential. This brings a double advantage: decreases the computational cost of computing the cross information potential, thus drastically reducing the running time, and uses the underlying statistics of the space region where representative points are in order to measure interaction. With this, created a useful non-parametric estimator of entropy and makes possible using cross information potential in applications where it was not. Due to the nature of clustering problems, we proposed a genetic algorithm in order to use rCIP as cost function. We ran several tests and compared the results with single linkage hierarchical algorithm, finite mixture of Gaussians and spectral clustering in both synthetic and real image segmentation datasets. Experiments showed that our approach achieved better results compared to the other algorithms and it was capable of capture the real structure of the data in most cases regardless of its complexity. It also produced good image segmentation with the advantage of a tuning parameter that provides a way of refining segmentation.