Improving ESA with document similarity

  • Authors:
  • Tamara Polajnar;Nitish Aggarwal;Kartik Asooja;Paul Buitelaar

  • Affiliations:
  • Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland;Ontology Engineering Group, Universidad Politecnica de Madrid, Madrid, Spain;Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland

  • Venue:
  • ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Explicit semantic analysis (ESA) is a technique for computing semantic relatedness between natural language texts. It is a document-based distributional model similar to latent semantic analysis (LSA), which is often built on the Wikipedia database when it is required for general English usage. Unlike LSA, however, ESA does not use dimensionality reduction, and therefore it is sometimes unable to account for similarity between words that do not co-occur with same concepts, even if their concepts themselves cover similar subjects. In the Wikipedia implementation ESA concepts are Wikipedia articles, and the Wikilinks between the articles are used to overcome the concept-similarity problem. In this paper, we provide two general solutions for integration of concept-concept similarities into the ESA model, ones that do not rely on a particular corpus structure and do not alter the explicit concept-mapping properties that distinguish ESA from models like LSA and latent Dirichlet allocation (LDA).