SemaFor: semantic document indexing using semantic forests

  • Authors:
  • George Tsatsaronis;Iraklis Varlamis;Kjetil Nørvåg

  • Affiliations:
  • Technische Universität Dresden, Dresden, Germany;Harokopio University of Athens, Athens, Greece;Norwegian University of Science and Technology, Trondheim, Norway

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional document indexing techniques store documents using easily accessible representations, such as inverted indices, which can efficiently scale for large document sets. These structures offer scalable and efficient solutions in text document management tasks, though, they omit the cornerstone of the documents' purpose: meaning. They also neglect semantic relations that bind terms into coherent fragments of text that convey messages. When semantic representations are employed, the documents are mapped to the space of concepts and the similarity measures are adapted appropriately to better fit the retrieval tasks. However, these methods can be slow both at indexing and retrieval time. In this paper we propose SemaFor, an indexing algorithm for text documents, which uses semantic spanning forests constructed from lexical resources, like Wikipedia, and WordNet, and spectral graph theory in order to represent documents for further processing.