Mining transliterations from Wikipedia using pair HMMs

  • Authors:
  • Peter Nabende

  • Affiliations:
  • University of Groningen, The Netherlands

  • Venue:
  • NEWS '10 Proceedings of the 2010 Named Entities Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the use of a pair Hidden Markov Model (pair HMM) system in mining transliteration pairs from noisy Wikipedia data. A pair HMM variant that uses nine transition parameters, and emission parameters associated with single character mappings between source and target language alphabets is identified and used in estimating transliteration similarity. The system resulted in a precision of 78% and recall of 83% when evaluated on a random selection of English-Russian Wikipedia topics.