Finding variants of out-of-vocabulary words in Arabic

Authors:
Abdusalam F. A. Nwesri;S. M. M. Tahaghoghi;Falk Scholer
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Year:
2007

Citing 12
Cited 0

A critical investigation of recall and precision as measures of retrieval system performance

ACM Transactions on Information Systems (TOIS)
PHOENIX: the algorithm

Program: Automated Library and Information Systems
Approximate string-matching with q-grams and maximal matches

Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Finding approximate matches in large lexicons

Software—Practice & Experience
Phonetic string matching: lessons from information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval effectiveness of proper name search methods

Information Processing and Management: an International Journal
The String-to-String Correction Problem

Journal of the ACM (JACM)
Approximate String Matching

ACM Computing Surveys (CSUR)
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
On the development of name search techniques for Arabic

Journal of the American Society for Information Science and Technology
Capturing out-of-vocabulary words in Arabic text

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transliteration of a word into another language often leads to multiple spellings. Unless an information retrieval system recognises different forms of transliterated words, a significant number of documents will be missed when users specify only one spelling variant. Using two different datasets, we evaluate several approaches to finding variants of foreign words in Arabic, and show that the longest common subsequence (LCS) technique is the best overall.