Algorithms for clustering data
Algorithms for clustering data
Elements of information theory
Elements of information theory
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Model Selection in Unsupervised Learning with Applications To Document Clustering
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
An information-theoretic external cluster-validity measure
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
IEEE Transactions on Fuzzy Systems
Hi-index | 0.00 |
Automated tools for knowledge discovery are frequently invoked in databases where objects already group into some known (i.e., external) classification scheme. In the context of unsupervised learning or clustering, such tools delve inside large databases looking for alternative classification schemes that are meaningful and novel. An assessment of the information gained with new clusters can be effected by looking at the degree of separation between each new cluster and its most similar class. Our approach models each cluster and class as a multivariate Gaussian distribution and estimates their degree of separation through an information theoretic measure (i.e., through relative entropy or Kullback---Leibler distance). The inherently large computational cost of this step is alleviated by first projecting all data over the single dimension that best separates both distributions (using Fisher's Linear Discriminant). We test our algorithm on a dataset of Martian surfaces using the traditional division into geological units as external classes and the new, hydrology-inspired, automatically performed division as novel clusters. We find the new partitioning constitutes a formally meaningful classification that deviates substantially from the traditional classification.