Ontology-based flexible topic classification of crowdsourcing textual resources

  • Authors:
  • Stefan Daniel Dumitrescu;Stefan Trausan-Matu;Mihaela Brut;Florence Sedes

  • Affiliations:
  • Romanian Academy Research Institute for Artificial Intelligence, Bucharest, Romania;Politehnica University of Bucharest, Bucharest, Romania;Thales Services, Palaiseau, France;Research Inst. In Computer Science, Toulouse, France

  • Venue:
  • Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper presents a solution to the problem of capitalizing in different contexts and by different stakeholders the time-stamped new documents produced by social Web sites (including news, blog entries, and uploaded documents). The solution core includes an ontology-based method to express the interest topics and to automatically classify them. For such textual content obtained in real-time, we propose an unsupervised text classification system based on general YAGO ontology, graph algorithms and a custom scoring method. The system shows good performance using only ontology information and the ontology structure itself. We compare our system against a SVM-based (Support Vector Machine) classic text classification approach. For determining the relevance of a specific document for a specific topic, our approach develops and compares the ontology sub graphs corresponding to the query and to the document. It leads to a high flexibility in terms of capitalizing the already classified documents when refining and changing the interest topic: a graph-based matching of the already obtained ontology-based document representation against the new query representation is enough to assess the document relevance.