Named entity transcription with pair n-gram models

Authors:
Martin Jansche;Richard Sproat
Affiliations:
Google Inc.;Google Inc. and OHSU
Venue:
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Year:
2009

Citing 6
Cited 4

A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Direct orthographical mapping for machine transliteration

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A generic framework for machine transliteration

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
WEB-derived pronunciations

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Whitepaper of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration

Compositional Machine Transliteration

ACM Transactions on Asian Language Information Processing (TALIP)
Integrating joint n-gram features into a discriminative training framework

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
SDDB: a self-dependent and data-based method for constructing bilingual dictionary from the web

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
A method for generating rules for cross-lingual transliteration

Automatic Documentation and Mathematical Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We submitted results for each of the eight shared tasks. Except for Japanese name kanji restoration, which uses a noisy channel model, our Standard Run submissions were produced by generative long-range pair n-gram models, which we mostly augmented with publicly available data (either from LDC datasets or mined from Wikipedia) for the Non-Standard Runs.