Insights into explicit semantic analysis

Authors:
Thomas Gottron;Maik Anderka;Benno Stein
Affiliations:
University of Koblenz-Landau, Koblenz, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 7
Cited 4

Generalized vector spaces model in information retrieval

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The ESA retrieval model revisited

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Concept-based feature generation and selection for information retrieval

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Explicit versus latent concept models for cross-language information retrieval

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Semantically enhanced term frequency

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Exploiting Wikipedia for cross-lingual and multilingual information retrieval

Data & Knowledge Engineering
Cross lingual semantic search by improving semantic similarity and relatedness measures

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Improving ESA with document similarity

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Since its debut the Explicit Semantic Analysis (ESA) has received much attention in the IR community. ESA has been proven to perform surprisingly well in several tasks and in different contexts. However, given the conceptual motivation for ESA, recent work has observed unexpected behavior. In this paper we look at the foundations of ESA from a theoretical point of view and employ a general probabilistic model for term weights which reveals how ESA actually works. Based on this model we explain some of the phenomena that have been observed in previous work and support our findings with new experiments. Moreover, we provide a theoretical grounding on how the size and the composition of the index collection affect the ESA-based computation of similarity values for texts.