Leightweight Document Clustering

  • Authors:
  • Sholom M. Weiss;Brian F. White;Chidanand Apté

  • Affiliations:
  • -;-;-

  • Venue:
  • PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A lightweight document clustering method is described that operates in high dimensions, processes tens of thousands of documents and groups them into several thousand clusters, or by varying a single parameter, into a few dozen clusters. The method uses a reduced indexing view of the original documents, where only the k best keywords of each document are indexed. An efficient procedure for clustering is speci fied in two parts (a) compute k most similar documents for each document in the collection and (b) group the documents into clusters using these similarity scores. The method has been evaluated on a database of over 50,000 customer service problem reports that are reduced to 3,000 clusters and 5,000 exemplar documents. Results demonstrate efficient clustering performance with excellent group similarity measures.