Algorithms for clustering data
Algorithms for clustering data
SIAM Journal on Scientific and Statistical Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Concept decompositions for large sparse text data using clustering
Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning
Information Retrieval
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood
IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Improved Fast Gauss Transform and Efficient Kernel Density Estimation
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Refining a divisive partitioning algorithm for unsupervised clustering
Design and application of hybrid intelligent systems
Automatic Subspace Clustering of High Dimensional Data
Data Mining and Knowledge Discovery
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
Introduction to Clustering Large and High-Dimensional Data
Introduction to Clustering Large and High-Dimensional Data
Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy
IEEE Transactions on Pattern Analysis and Machine Intelligence
Generalizing the k-Windows clustering algorithm in metric spaces
Mathematical and Computer Modelling: An International Journal
Clustering of high dimensional data streams
SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Random direction divisive clustering
Pattern Recognition Letters
A novel classification learning framework based on estimation of distribution algorithms
International Journal of Computing Science and Mathematics
Computer Methods and Programs in Biomedicine
Hi-index | 0.01 |
While data clustering has a long history and a large amount of research has been devoted to the development of numerous clustering techniques, significant challenges still remain. One of the most important of them is associated with high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such datasets, utilising information driven by the principal component analysis. In this work, we try to deepen our understanding on what can be achieved by this kind of approaches. We attempt to theoretically discover the relationship between true clusters in the data and the distribution of their projection onto the principal components. Based on such findings, we propose appropriate criteria for the various steps involved in hierarchical divisive clustering and develop compilations of them into new algorithms. The proposed algorithms require minimal user-defined parameters and have the desirable feature of being able to provide approximations for the number of clusters present in the data. The experimental results indicate that the proposed techniques are effective in simulated as well as real data scenarios.