Enhancing principal direction divisive clustering

Authors:
S. K. Tasoulis;D. K. Tasoulis;V. P. Plagianakos
Affiliations:
Department of Computer Science and Biomedical Informatics, University of Central Greece, 2-4 Papassiopoulou Str., Lamia 35100, Greece;Mathematics Department, Imperial College London, 180 Queen's Gate, SW7 2AZ, UK;Department of Computer Science and Biomedical Informatics, University of Central Greece, 2-4 Papassiopoulou Str., Lamia 35100, Greece
Venue:
Pattern Recognition
Year:
2010

Citing 22
Cited 5

Algorithms for clustering data

Algorithms for clustering data
The fast Gauss transform

SIAM Journal on Scientific and Statistical Computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Concept decompositions for large sparse text data using clustering

Machine Learning
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning

Information Retrieval
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Improved Fast Gauss Transform and Efficient Kernel Density Estimation

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Refining a divisive partitioning algorithm for unsupervised clustering

Design and application of hybrid intelligent systems
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Introduction to Clustering Large and High-Dimensional Data

Introduction to Clustering Large and High-Dimensional Data
Automated Construction of Classifications: Conceptual Clustering Versus Numerical Taxonomy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Generalizing the k-Windows clustering algorithm in metric spaces

Mathematical and Computer Modelling: An International Journal

Clustering of high dimensional data streams

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Random direction divisive clustering

Pattern Recognition Letters
A novel classification learning framework based on estimation of distribution algorithms

International Journal of Computing Science and Mathematics
Statistical data mining of streaming motion data for activity and fall recognition in assistive environments

Neurocomputing
OLYMPUS: An automated hybrid clustering method in time series gene expression. Case study: Host response after Influenza A (H1N1) infection

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.01

Visualization

Abstract

While data clustering has a long history and a large amount of research has been devoted to the development of numerous clustering techniques, significant challenges still remain. One of the most important of them is associated with high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such datasets, utilising information driven by the principal component analysis. In this work, we try to deepen our understanding on what can be achieved by this kind of approaches. We attempt to theoretically discover the relationship between true clusters in the data and the distribution of their projection onto the principal components. Based on such findings, we propose appropriate criteria for the various steps involved in hierarchical divisive clustering and develop compilations of them into new algorithms. The proposed algorithms require minimal user-defined parameters and have the desirable feature of being able to provide approximations for the number of clusters present in the data. The experimental results indicate that the proposed techniques are effective in simulated as well as real data scenarios.