Clustering of rough set related documents with use of knowledge from DBpedia

Authors:
Marcin Szczuka;Andrzej Janusz;Kamil Herba
Affiliations:
Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Warsaw, Poland;Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Warsaw, Poland;Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Warsaw, Poland
Venue:
RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Year:
2011

Citing 6
Cited 4

Ontology Based Semantic Similarity Comparison of Documents

DEXA '03 Proceedings of the 14th International Workshop on Database and Expert Systems Applications
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Semantic similarity methods in wordNet and their application to information retrieval on the web

Proceedings of the 7th annual ACM international workshop on Web information and data management
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web

Semantic analytics of pubmed content

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
Dynamic rule-based similarity model for DNA microarray data

Transactions on Rough Sets XV
Unsupervised Similarity Learning from Textual Data

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Semantic clustering of scientific articles using explicit semantic analysis

Transactions on Rough Sets XVI

Quantified Score

Hi-index	0.00

Visualization

Abstract

A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with results of experiments performed on a corpus of scientific documents about rough sets.