Fast clustering algorithm for information organization

  • Authors:
  • Kwangcheol Shin;Sangyong Han

  • Affiliations:
  • Dept. of Computer Science and Engineering, Chung-Ang Univ., Seoul, Korea;Dept. of Computer Science and Engineering, Chung-Ang Univ., Seoul, Korea

  • Venue:
  • CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

This study deals with information organization for more efficient Internet document search and browsing results. As the appropriate algorithm for this purpose, this study proposes the heuristic algorithm, which functions similarly with the star clustering algorithm but performs a more efficient time complexity of O(kn), (k≪n) instead of O(n2) found in the star clustering algorithm. The proposed heuristic algorithm applies the cosine similarity and sets vectors composed of the most non-zero elements as the initial standard value. The algorithm is purported to execute the clustering procedure based on the concept vector and produce clusters for information organization in O(kn) period of time. In order to see how fast the proposed algorithm is in producing clusters for organizing information, the algorithm is tested on TIME and CLASSIC3 in comparison with the star clustering algorithm.