Documents clustering using tolerance rough set model and its application to information retrieval

  • Authors:
  • Tu Bao Ho;Saori Kawasaki;Ngoc Binh Nguyen

  • Affiliations:
  • Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-1292 Japan;Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-1292 Japan;Hanoi University of Technology, DaiCoViet Road, Hanoi, Vietnam

  • Venue:
  • Intelligent exploration of the web
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustering is a powerful tool for analyzing and finding useful information in text collections. However, document clustering is a difficult clustering problem because of the unstructured form and textual characteristics of documents. As a consequence, the quality of document clustering depends not only on clustering algorithms but also on document representation models. In this work we introduce a tolerance rough set model (TRSM) for representing documents as an alternative way of considering semantics relatedness between documents. Using TRSM we develop two hierarchical and nonhierarchical clustering algorithms for documents and apply these clustering methods to information retrieval. The TRSM clustering methods and the TRSM cluster-based information retrieval method are carefully evaluated and validated by comparative experiments on test collections.