Principal Direction Divisive Partitioning

  • Authors:
  • Daniel Boley

  • Affiliations:
  • Department of Computer Science and Engineering, University of Minnesota, 200 Union Street S.E., Rm 4-192, Minneapolis, MN 55455, USA. boley@cs.umn.edu

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new algorithm capable of partitioning a set of documents orother samples based on an embedding in a high dimensional Euclidean space (i.e., in which every document is a vector of real numbers). The method isunusual in that it is divisive, as opposed to agglomerative, and operates byrepeatedly splitting clusters into smaller clusters.The documents are assembled into a matrix which is very sparse. It is this sparsity that permits thealgorithm to be very efficient. The performance of the method isillustrated with a set of text documents obtained from the World Wide Web.Some possible extensions are proposed for further investigation.