Distributed hierarchical document clustering

  • Authors:
  • Debzani Deb;M. Muztaba Fuad;Rafal A. Angryk

  • Affiliations:
  • Department of Computer Science, Montana State University, Bozeman, MT;Department of Computer Science, Montana State University, Bozeman, MT;Department of Computer Science, Montana State University, Bozeman, MT

  • Venue:
  • ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the applicability of distributed clustering technique, called RACHET [1], to organize large sets of distributed text data. Although the authors of RACHET claim that the algorithm generates quality clusters for massive and high dimensional data set, the algorithm was not yet evaluated on a well known academic data set. This paper presents performance analysis of the algorithm and tests its suitability for distributed document clustering. This work uses three widely known hierarchical algorithms to generate local clusters at each of distributed repositories and then the RACHET is applied to merge distributed hierarchies of clusters. We perform our own tests of the algorithm on standard document corpora [2], using popular cluster evaluation measures [3, 4] and discuss important implementation details.