Backward machine transliteration by learning phonetic similarity

Authors:
Wei-Hao Lin;Hsin-Hsi Chen
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;National Taiwan University, Taipei, Taiwan
Venue:
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Year:
2002

Citing 11
Cited 31

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cross-language information access to multilingual collections on the internet

Journal of the American Society for Information Science - digital libraries: Part 1
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Machine transliteration

Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A stochastic finite-state word-segmentation algorithm for Chinese

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Identification and classification of proper nouns in Chinese texts

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Mining events and new name translations from online daily news

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Learning phonetic similarity for matching named entity translations and mining new translations

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Chinese OOV translation and post-translation query expansion in chinese--english cross-lingual information retrieval

ACM Transactions on Asian Language Information Processing (TALIP)
Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Learning formulation and transformation rules for multilingual named entities

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translating–transliterating named entities for multilingual information access

Journal of the American Society for Information Science and Technology
An ensemble of transliteration models for information retrieval

Information Processing and Management: an International Journal
Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

ACM Transactions on Asian Language Information Processing (TALIP)
A machine transliteration model based on correspondence between graphemes and phonemes

ACM Transactions on Asian Language Information Processing (TALIP)
Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
A hybrid back-transliteration system for Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Back transliteration from Japanese to English using target English context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Measuring similarity between transliterations against noise data

ACM Transactions on Asian Language Information Processing (TALIP)
Integrating textual and visual information for cross-language image retrieval: a trans-media dictionary approach

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
A high-accurate Chinese-English NE backward translation system combining both lexical information and web statistics

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
Combining probability models and web mining models: a framework for proper name transliteration

Information Technology and Management
English-Arabic proper-noun transliteration-pairs creation

Journal of the American Society for Information Science and Technology
Synonymous Chinese Transliterations Retrieval from World Wide Web by Using Association Words

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Induction of cross-language affix and letter sequence correspondence

CrossLangInduction '06 Proceedings of the International Workshop on Cross-Language Knowledge Induction
A comparison of different machine transliteration models

Journal of Artificial Intelligence Research
Named entity translation with web mining and transliteration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

Information Sciences: an International Journal
Mining Synonymous Transliterations from the World Wide Web

ACM Transactions on Asian Language Information Processing (TALIP)
Machine transliteration survey

ACM Computing Surveys (CSUR)
Improving machine transliteration performance by using multiple transliteration models

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Integrating textual and visual information for cross-language image retrieval

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Improving back-transliteration by combining information sources

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Direct combination of spelling and pronunciation information for robust back-transliteration

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
From text to image: generating visual query for image retrieval

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
A joint model to identify and align bilingual named entities

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many cross-lingual applications we need to convert a transliterated word into its original word. In this paper, we present a similarity-based framework to model the task of backward transliteration, and provide a learning algorithm to automatically acquire phonetic similarities from a corpus. The learning algorithm is based on Widrow-Hoff rule with some modifications. The experiment results show that the learning algorithm converges quickly, and the method using acquired phonetic similarities remarkably outperforms previous methods using pre-defined phonetic similarities or graphic similarities in a corpus of 1574 pairs of English names and transliterated Chinese names. The learning algorithm does not assume any underlying phonological structures or rules, and can be extended to other language pairs once a training corpus and a pronouncing dictionary are available.