Using latent semantic analysis to find different names for the same entity in free text

Authors:
Tim Oates;Vinay Bhat;Vishal Shanbhag
Affiliations:
University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD
Venue:
Proceedings of the 4th international workshop on Web information and data management
Year:
2002

Citing 3
Cited 3

Class-based n-gram models of natural language

Computational Linguistics
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Integrating Distributed Information Sources with CARROT II

CIA '02 Proceedings of the 6th International Workshop on Cooperative Information Agents VI

Combining web-based searching with latent semantic analysis to discover similarity between phrases

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Towards alias detection without string similarity: an active learning based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Toward detection of aliases without string similarity

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common problem faced when gathering information from the web is the use of different names to refer to the same entity. For example, the city in India referred to as Bombay in some documents may be referred to as Mumbai in others because its name officially changed from the former to the latter in 1995. Multiplicity of names can cause relevant documents to be missed by search engines. Our goal is to develop an automated system that discovers additional names for an entity given just one of its names. Latent semantic analysis (LSA) is generally thought to be well-suited for this task (Berry & Fierro 1996). We demonstrate empirically that under a broad range of circumstances LSA performs poorly, and describe a two-stage algorithm based on LSA that performs significantly better.