Finding aliases on the web using latent semantic analysis

  • Authors:
  • Vinay Bhat;Tim Oates;Vishal Shanbhag;Charles Nicholas

  • Affiliations:
  • Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD;Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD;Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD;Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD

  • Venue:
  • Data & Knowledge Engineering - Special issue: WIDM 2002
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common problem faced when gathering information from the web is the use of different names to refer to the same entity. For example, the city in India referred to as Bombay in some documents may be referred to as Mumbai in others because its name officially changed from the former to the latter in 1995. Multiplicity of names can cause relevant documents to be missed by search engines. Our goal is to develop an automated system that discovers additional names for an entity given just one of its names. Latent semantic analysis (LSA) is generally thought to be well-suited for this task [Numerical linear algebra with applications 3(4) (1996) 301]. We demonstrate empirically that under a broad range of circumstances LSA performs poorly, and describe a two-stage algorithm based on LSA that performs significantly better.