Hierarchical model-based clustering of large datasets through fractionation and refractionation

Authors:
Jeremy Tantrum;Alejandro Murua;Werner Stuetzle
Affiliations:
Department of Statistics, University of Washington, Seattle, WA;Department of Statistics, University of Washington, Seattle, WA;Department of Statistics, University of Washington, Seattle, WA
Venue:
Information Systems - Knowledge discovery and data mining (KDD 2002)
Year:
2004

Citing 4
Cited 2

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Using linear algebra for intelligent information retrieval

SIAM Review
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Modeling the manifolds of images of handwritten digits

IEEE Transactions on Neural Networks

Graph nodes clustering with the sigmoid commute-time kernel: A comparative study

Data & Knowledge Engineering
Dampster-Shafer evidence theory based multi-characteristics fusion for clustering evaluation

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of clustering is to identify distinct groups in a dataset. Compared to non-parametric clustering methods like complete linkage, hierarchical model-based clustering has the advantage of offering a way to estimate the number of groups present in the data. However, its computational cost is quadratic in the number of items to be clustered, and it is therefore not applicable to large problems. We review an idea called Fractionation, originally conceived by Cutting, Karger, Pedersen and Tukey for non-parametric hierarchical clustering of large datasets, and describe an adaptation of Fractionation to model-based clustering. A further extension, called Refractionation, leads to a procedure that can be successful even in the difficult situation where there are large numbers of small groups.