Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Introduction to Information Retrieval
Introduction to Information Retrieval
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
DBpedia - A crystallization point for the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
Clustering of rough set related documents with use of knowledge from DBpedia
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Semantic analytics of pubmed content
USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
Dynamic rule-based similarity model for DNA microarray data
Transactions on Rough Sets XV
Unsupervised Similarity Learning from Textual Data
Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Hi-index | 0.00 |
This paper summarizes our recent research on semantic clustering of scientific articles. We present a case study which was focused on analysis of papers related to the Rough Sets theory. The proposed method groups the documents on the basis of their content, with an assistance of the DBpedia knowledge base. The text corpus is first processed using Natural Language Processing tools in order to produce vector representations of the content. In the second step the articles are matched against a collection of concepts retrieved from DBpedia. As a result, a new representation that better reflects the semantics of the texts, is constructed. With this new representation the documents are hierarchically clustered in order to form a partitioning of papers into semantically related groups. The steps in textual data preparation, the utilization of DBpedia and the employed clustering methods are explained and illustrated with experimental results. A quality of the resulting clustering is then discussed. It is assessed using feedback form human experts combined with typical cluster quality measures. These results are then discussed in the context of a larger framework that aims to facilitate search and information extraction from large textual repositories.