SemaFor: semantic document indexing using semantic forests

Authors:
George Tsatsaronis;Iraklis Varlamis;Kjetil Nørvåg
Affiliations:
Technische Universität Dresden, Dresden, Germany;Harokopio University of Athens, Athens, Greece;Norwegian University of Science and Technology, Trondheim, Norway
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 12
Cited 1

Topic Extraction from Text Documents Using Multiple-Cause Networks

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A Scalable Topic-Based Open Source Search Engine

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method

WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Measuring Graph Similarity Using Spectral Geometry

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
A generalized vector space model for text retrieval based on semantic relatedness

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Word sense disambiguation with spreading activation networks generated from thesauri

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Document indexing: a concept-based approach to term weight estimation

Information Processing and Management: an International Journal
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
A semantic kernel to exploit linguistic knowledge

AI*IA'05 Proceedings of the 9th conference on Advances in Artificial Intelligence
Geometric characterisation of graphs

ICIAP'05 Proceedings of the 13th international conference on Image Analysis and Processing

Understanding the diversity of tweets in the time of outbreaks

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional document indexing techniques store documents using easily accessible representations, such as inverted indices, which can efficiently scale for large document sets. These structures offer scalable and efficient solutions in text document management tasks, though, they omit the cornerstone of the documents' purpose: meaning. They also neglect semantic relations that bind terms into coherent fragments of text that convey messages. When semantic representations are employed, the documents are mapped to the space of concepts and the similarity measures are adapted appropriately to better fit the retrieval tasks. However, these methods can be slow both at indexing and retrieval time. In this paper we propose SemaFor, an indexing algorithm for text documents, which uses semantic spanning forests constructed from lexical resources, like Wikipedia, and WordNet, and spectral graph theory in order to represent documents for further processing.