An effective document clustering method using user-adaptable distance metrics

  • Authors:
  • Han-joon Kim;Sang-goo Lee

  • Affiliations:
  • Seoul National University, San 56-1, Shillim-dong, Gwanak-gu, Seoul, Korea;Seoul National University, San 56-1, Shillim-dong, Gwanak-gu, Seoul, Korea

  • Venue:
  • Proceedings of the 2002 ACM symposium on Applied computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document clustering is inherently an unsupervised learning process that organizes document (or text) data into distinct groups without depending on pre-specified knowledge. However, real-world applications, such as building a topical hierarchy for a large document collection, need to perform clustering under various kinds of constraints. This paper presents a new type of supervised clustering to organize information in a way that reflects knowledge provided by a user. As a means by which external human knowledge can be incorporated into the clustering process, a quadratic form distance metric is employed that contains a weight matrix. Also, we propose a way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy.