Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams

  • Authors:
  • Bruno Pouliquen

  • Affiliations:
  • European Commission - Joint Research Centre, Italy 2749 21027 Ispra (VA)

  • Venue:
  • GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Any cross-language processing application has to first tackle the problem of transliteration when facing a language using another script. The first solution consists of using existing transliteration tools, but these tools are not usually suitable for all purposes. For some specific script pairs they do not even exist. Our aim is to discriminate transliterations across different scripts in a unified way using a learning method that builds a transliteration model out of a set of transliterated proper names. We compare two strings using an algorithm that builds a Levenshtein edit distance using n-grams costs. The evaluations carried out show that our similarity measure is accurate.