Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets and Data Mining: Analysis of Imprecise Data
Rough Sets and Data Mining: Analysis of Imprecise Data
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Rough Sets in Knowledge Discovery 2: Applications, Case Studies, and Software Systems
Modern Information Retrieval
TextVis: An Integrated Visual Environment for Text Mining
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Engineering graph clustering: Models and experimental evaluation
Journal of Experimental Algorithmics (JEA)
Efficient approach for incremental Vietnamese document clustering
Proceedings of the eleventh international workshop on Web information and data management
Practical evaluation of textual fuzzy similarity as a tool for information retrieval
AWIC'03 Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence
The Knowledge Engineering Review
Hi-index | 0.01 |
Clustering is a powerful tool for analyzing and finding useful information in text collections. However, document clustering is a difficult clustering problem because of the unstructured form and textual characteristics of documents. As a consequence, the quality of document clustering depends not only on clustering algorithms but also on document representation models. In this work we introduce a tolerance rough set model (TRSM) for representing documents as an alternative way of considering semantics relatedness between documents. Using TRSM we develop two hierarchical and nonhierarchical clustering algorithms for documents and apply these clustering methods to information retrieval. The TRSM clustering methods and the TRSM cluster-based information retrieval method are carefully evaluated and validated by comparative experiments on test collections.