Cluster-specific named entity transliteration
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Report of NEWS 2009 machine transliteration shared task
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
DirecTL: a language-independent approach to transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Transliteration generation and mining with limited training resources
NEWS '10 Proceedings of the 2010 Named Entities Workshop
G2P conversion of proper names using word origin information
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
Language identification for creating language-specific Twitter collections
LSM '12 Proceedings of the Second Workshop on Language in Social Media
Microblog language identification: overcoming the limitations of short, unedited and idiomatic text
Language Resources and Evaluation
Hi-index | 0.01 |
The task of identifying the language of text or utterances has a number of applications in natural language processing. Language identification has traditionally been approached with character-level language models. However, the language model approach crucially depends on the length of the text in question. In this paper, we consider the problem of language identification of names. We show that an approach based on SVMs with n-gram counts as features performs much better than language models. We also experiment with applying the method to pre-process transliteration data for the training of separate models.