Cross linguistic name matching in English and Arabic: a "one to many mapping" extension of the Levenshtein edit distance algorithm

  • Authors:
  • Andrew T. Freeman;Sherri L. Condon;Christopher M. Ackerman

  • Affiliations:
  • The Mitre Corporation, McLean, Va;The Mitre Corporation, McLean, Va;The Mitre Corporation, McLean, Va

  • Venue:
  • HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a solution to the problem of matching personal names in English to the same names represented in Arabic script. Standard string comparison measures perform poorly on this task due to varying transliteration conventions in both languages and the fact that Arabic script does not usually represent short vowels. Significant improvement is achieved by augmenting the classic Levenshtein edit-distance algorithm with character equivalency classes.