Clustering large datasets using cobweb and k-means in tandem

Authors:
Mi Li;Geoffrey Holmes;Bernhard Pfahringer
Affiliations:
Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand
Venue:
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Year:
2004

Citing 7
Cited 0

Models of incremental concept formation

Artificial Intelligence
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.