Enhancing search result clustering with semantic indexing

Authors:
Sinh Hoa Nguyen;Grzegorz Jaśkiewicz;Wojciech Świeboda;Hung Son Nguyen
Affiliations:
The University of Warsaw, Banacha, Warsaw, Poland and Polish-Japanese Institute of Information Technology, Koszykowa, Warsaw, Poland;The University of Warsaw, Banacha, Warsaw, Poland;The University of Warsaw, Banacha, Warsaw, Poland;The University of Warsaw, Banacha, Warsaw, Poland
Venue:
Proceedings of the Third Symposium on Information and Communication Technology
Year:
2012

Citing 10
Cited 0

Tolerance approximation spaces

Fundamenta Informaticae - Special issue: rough sets
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Hierarchical Document Clustering Based on Tolerance Rough Set Model

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
A Method of Web Search Result Clustering Based on Rough Sets

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Introduction to Information Retrieval

Introduction to Information Retrieval
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic search results clustering is one of the most wanted functionalities of many information retrieval systems including general web search engines as well as domain specific article portals or digital libraries. It may advice the users to describe the need for information in a more precise way. In this paper, we discuss a framework of document description extension which utilizes domain knowledge and semantic similarity. Our idea is based on application of Tolerance Rough Set Model, semantic information extracted from source text and domain ontology to approximate concepts associated with documents and to enrich the vector representation. Some document representation models including document meta-data, citations and semantic information build using MeSH ontology. We compare those models in a search result clustering problem over the freely accessed biomedical research articles from Pubmed Cetral (PMC) portal. The experimental results are showing the advantages of the proposed models.