Insights into explicit semantic analysis

  • Authors:
  • Thomas Gottron;Maik Anderka;Benno Stein

  • Affiliations:
  • University of Koblenz-Landau, Koblenz, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Bauhaus-Universität Weimar, Weimar, Germany

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Since its debut the Explicit Semantic Analysis (ESA) has received much attention in the IR community. ESA has been proven to perform surprisingly well in several tasks and in different contexts. However, given the conceptual motivation for ESA, recent work has observed unexpected behavior. In this paper we look at the foundations of ESA from a theoretical point of view and employ a general probabilistic model for term weights which reveals how ESA actually works. Based on this model we explain some of the phenomena that have been observed in previous work and support our findings with new experiments. Moreover, we provide a theoretical grounding on how the size and the composition of the index collection affect the ESA-based computation of similarity values for texts.