Computational techniques for improved name search

  • Authors:
  • Beatrice T. Oshika;Bruce Evans;Filip Machi;Janet Tom

  • Affiliations:
  • SPARTA, Inc., Berkeley, CA;TRW, MS 02/1761, Redondo Beach, CA;University of California, Berkeley, Berkeley, CA;Santa Monica Research Center, Unisys, Santa Monica, CA

  • Venue:
  • ANLC '88 Proceedings of the second conference on Applied natural language processing
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes enhancements made to techniques currently used to search large databases of proper names. Improvements included use of a Hidden Markov Model (HMM) statistical classifier to identify the likely linguistic provenance of a surname, and application of language-specific rules to generate plausible spelling variations of names. These two components were incorporated into a prototype front-end system driving existing name search procedures. HMM models and sets of linguistic rules were constructed for Farsi, Spanish and Vietnamese surnames and tested on a database of over 11,000 entries. Preliminary evaluation indicates improved retrieval of 20--30% as measured by number of correct items retrieved.