Mining transliterations from Wikipedia using pair HMMs

Authors:
Peter Nabende
Affiliations:
University of Groningen, The Netherlands
Venue:
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Year:
2010

Citing 5
Cited 3

A generic framework for machine transliteration

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Inducing sound segment differences using Pair Hidden Markov Models

SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Transliteration system using pair HMM with weighted FSTs

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Computing word similarity and identifying cognates with pair hidden Markov models

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Advanced Data Mining Techniques

Advanced Data Mining Techniques

Report of NEWS 2010 transliteration mining shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
A statistical model for unsupervised and semi-supervised transliteration mining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A Bayesian Alignment Approach to Transliteration Mining

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the use of a pair Hidden Markov Model (pair HMM) system in mining transliteration pairs from noisy Wikipedia data. A pair HMM variant that uses nine transition parameters, and emission parameters associated with single character mappings between source and target language alphabets is identified and used in estimating transliteration similarity. The system resulted in a precision of 78% and recall of 83% when evaluated on a random selection of English-Russian Wikipedia topics.