Topic Extraction from Text Documents Using Multiple-Cause Networks
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A Scalable Topic-Based Open Source Search Engine
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Concept Forest: A New Ontology-assisted Text Document Similarity Measurement Method
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
Measuring Graph Similarity Using Spectral Geometry
ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
A generalized vector space model for text retrieval based on semantic relatedness
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Word sense disambiguation with spreading activation networks generated from thesauri
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Document indexing: a concept-based approach to term weight estimation
Information Processing and Management: an International Journal
Text relatedness based on a word thesaurus
Journal of Artificial Intelligence Research
A semantic kernel to exploit linguistic knowledge
AI*IA'05 Proceedings of the 9th conference on Advances in Artificial Intelligence
Geometric characterisation of graphs
ICIAP'05 Proceedings of the 13th international conference on Image Analysis and Processing
Understanding the diversity of tweets in the time of outbreaks
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
Traditional document indexing techniques store documents using easily accessible representations, such as inverted indices, which can efficiently scale for large document sets. These structures offer scalable and efficient solutions in text document management tasks, though, they omit the cornerstone of the documents' purpose: meaning. They also neglect semantic relations that bind terms into coherent fragments of text that convey messages. When semantic representations are employed, the documents are mapped to the space of concepts and the similarity measures are adapted appropriately to better fit the retrieval tasks. However, these methods can be slow both at indexing and retrieval time. In this paper we propose SemaFor, an indexing algorithm for text documents, which uses semantic spanning forests constructed from lexical resources, like Wikipedia, and WordNet, and spectral graph theory in order to represent documents for further processing.