Foundations of statistical natural language processing
Foundations of statistical natural language processing
A second-order Hidden Markov Model for part-of-speech tagging
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
NLTK: the natural language toolkit
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
An unsupervised morpheme-based HMM for hebrew morphological disambiguation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Capturing out-of-vocabulary words in Arabic text
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Translating names and technical terms in Arabic text
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Lightly supervised transliteration for machine translation
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Learning phoneme mappings for transliteration without parallel data
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Report of NEWS 2009 machine transliteration shared task
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Report of NEWS 2010 transliteration generation shared task
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Report of NEWS 2012 machine transliteration shared task
NEWS '12 Proceedings of the 4th Named Entity Workshop
Hi-index | 0.00 |
We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training - we learn from noisy data acquired by over-generation. We report precision/ recall results of 80/82 for a corpus of 4044 unique words, containing 368 foreign words.