Documents clustering using tolerance rough set model and its application to information retrieval

Authors:
Tu Bao Ho;Saori Kawasaki;Ngoc Binh Nguyen
Affiliations:
Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-1292 Japan;Japan Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa, 923-1292 Japan;Hanoi University of Technology, DaiCoViet Road, Hanoi, Vietnam
Venue:
Intelligent exploration of the web
Year:
2003

Citing 9
Cited 4

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets and Data Mining: Analysis of Imprecise Data

Rough Sets and Data Mining: Analysis of Imprecise Data
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems

Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Modern Information Retrieval

Modern Information Retrieval
TextVis: An Integrated Visual Environment for Text Mining

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery

Engineering graph clustering: Models and experimental evaluation

Journal of Experimental Algorithmics (JEA)
Efficient approach for incremental Vietnamese document clustering

Proceedings of the eleventh international workshop on Web information and data management
Practical evaluation of textual fuzzy similarity as a tool for information retrieval

AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
Review:

The Knowledge Engineering Review

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering is a powerful tool for analyzing and finding useful information in text collections. However, document clustering is a difficult clustering problem because of the unstructured form and textual characteristics of documents. As a consequence, the quality of document clustering depends not only on clustering algorithms but also on document representation models. In this work we introduce a tolerance rough set model (TRSM) for representing documents as an alternative way of considering semantics relatedness between documents. Using TRSM we develop two hierarchical and nonhierarchical clustering algorithms for documents and apply these clustering methods to information retrieval. The TRSM clustering methods and the TRSM cluster-based information retrieval method are carefully evaluated and validated by comparative experiments on test collections.