Semi-supervised learning for mixed-type data via formal concept analysis
ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
Text mining scientific papers: a survey on FCA-Based information retrieval research
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
Semi-supervised learning on closed set lattices
Intelligent Data Analysis
Hi-index | 0.00 |
Organizing web search results into a hierarchy of topics and subtopics facilitates browsing the collection and locating results of interest. In this paper, we propose a new method based on formal concept analysis (FCA) tobuild a two-level hierarchy for retrieved search results ofa query. After formal concepts are extracted using FCA, anew algorithm is proposed to extract concepts most relevant to the query and a two-level hierarchy is builtand presented to the user. Evaluating the quality of the resulting clusters is a non-trivial task. Two improved objective metrics of clustering quality, ANMI@K and ANCE@K, are proposed in this paper. We compare our method with three other search results clustering (SRC) algorithms: Suffix Tree Clustering (STC), Lingo, and Vivisimo, using a comprehensive set of documents obtained from the Open Directory Project hierarchy as benchmark. In addition to comparison based on objective measures, we also subjectively analyze the properties of cluster labels produced by different SRC algorithms. The experimental results show that our method outperforms the other three SRC algorithms, and is helpful for browsing and locating the results of interests.