CLUMP: A Scalable and Robust Framework for Structure Discovery

Authors:
Kunal Punera;Joydeep Ghosh
Affiliations:
University of Texas at Austin;University of Texas at Austin
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 8
Cited 1

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Concept decompositions for large sparse text data using clustering

Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
CLUMP: A Scalable and Robust Framework for Structure Discovery

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

CLUMP: A Scalable and Robust Framework for Structure Discovery

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a robust and efficient framework called CLUMP (CLustering Using Multiple Prototypes) for unsupervised discovery of structure in data. CLUMP relies on finding multiple prototypes that summarize the data. Clustering the prototypes enables our algorithm to scale up to extremely large and high-dimensional domains such as text data. Other desirable properties include robustness to noise and parameter choices. In this paper, we describe the approach in detail, characterize its performance on a variety of datasets, and compare it to some existing model selection approaches.