Identification of transliterated foreign words in Hebrew script

Authors:
Yoav Goldberg;Michael Elhadad
Affiliations:
Computer Science Department, Ben Gurion University of the Negev, Be'er Sheva, Israel;Computer Science Department, Ben Gurion University of the Negev, Be'er Sheva, Israel
Venue:
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Year:
2008

Citing 9
Cited 5

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A second-order Hidden Markov Model for part-of-speech tagging

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
NLTK: the natural language toolkit

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
An unsupervised morpheme-based HMM for hebrew morphological disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Capturing out-of-vocabulary words in Arabic text

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Lightly supervised transliteration for machine translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Learning phoneme mappings for transliteration without parallel data

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Report of NEWS 2010 transliteration generation shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Report of NEWS 2012 machine transliteration shared task

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training - we learn from noisy data acquired by over-generation. We report precision/ recall results of 80/82 for a corpus of 4044 unique words, containing 368 foreign words.