Finding approximate matches in large lexicons
Software—Practice & Experience
Effective foreign word extration for Korean information retrieval
Information Processing and Management: an International Journal
On arabic search: improving the retrieval effectiveness via a light stemming approach
Proceedings of the eleventh international conference on Information and knowledge management
Statistical transliteration for english-arabic cross language information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Arabic Stemming Without A Root Dictionary
ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Translating names and technical terms in Arabic text
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
A Method for Recognizing Noisy Romanized Japanese Words in Learner English
IEICE - Transactions on Information and Systems
Recognizing noisy romanized Japanese words in learner English
EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Finding variants of out-of-vocabulary words in Arabic
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Identification of transliterated foreign words in Hebrew script
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
The Effect of Stemming on Arabic Text Classification: An Empirical Study
International Journal of Information Retrieval Research
Hi-index | 0.00 |
The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval. For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.