Clustering large datasets using cobweb and k-means in tandem

  • Authors:
  • Mi Li;Geoffrey Holmes;Bernhard Pfahringer

  • Affiliations:
  • Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand

  • Venue:
  • AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.