Clustering of rough set related documents with use of knowledge from DBpedia

  • Authors:
  • Marcin Szczuka;Andrzej Janusz;Kamil Herba

  • Affiliations:
  • Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Warsaw, Poland;Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Warsaw, Poland;Faculty of Mathematics, Informatics, and Mechanics, The University of Warsaw, Warsaw, Poland

  • Venue:
  • RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A case study of semantic clustering of scientific articles related to Rough Sets is presented. The proposed method groups the documents on the basis of their content and with assistance of DBpedia knowledge base. The text corpus is first treated with Natural Language Processing tools in order to produce vector representations of the content and then matched against a collection of concepts retrieved from DBpedia. As a result, a new representation is constructed that better reflects the semantics of the texts. With this new representation, the documents are hierarchically clustered in order to form partition of papers that share semantic relatedness. The steps in textual data preparation, utilization of DBpedia and clustering are explained and illustrated with results of experiments performed on a corpus of scientific documents about rough sets.