Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Querying across languages: a dictionary-based approach to multilingual information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-linguistic information retrieval workshop
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the American Society for Information Science
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness
Proceedings of the tenth international conference on Information and knowledge management
Empirical studies in strategies for Arabic retrieval
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Evaluating Interactive Cross-Language Information Retrieval: Document Selection
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Cross language information retrieval: a research roadmap
ACM SIGIR Forum
On Arabic-English Cross-Language Information Retrieval: A Machine Translation Approach
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Statistical transliteration for english-arabic cross language information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Information retrieval using robust natural language processing
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
How do search engines respond to some non-English queries?
Journal of Information Science
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
An HMM approach to vowel restoration in Arabic and Hebrew
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
QARAB: a question answering system to support the Arabic language
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Transliteration of proper names in cross-lingual information retrieval
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Maximum entropy based restoration of Arabic diacritics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Journal of the American Society for Information Science and Technology
Web retrieval systems and the Greek language: do they have an understanding?
Journal of Information Science
Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Arabic to French sentence alignment: exploration of a cross-language information retrieval approach
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Improved Arabic base phrase chunking with a new enriched POS tag set
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Current research issues and trends in non-English Web searching
Information Retrieval
Hi-index | 0.00 |
The majority of Arabic text available on the web is written without short vowels (diacritics). Diacritics are commonly used in religious scripts such as the holy Quran (the book of Islam), Al-Hadith (the teachings of Prophet Mohammad (PBUH)), children's literature, and in some words where ambiguity of articulation might arise. Internet Arabic users might lose credible sources of Arabic text to be retrieved if they could not match the correct diacritical marks attached to the words in the collection. However, typing the diacritical marks is very annoying and time consuming. The other way around, is to ignore these marks and fall into the problem of ambiguity. Previous work suggested pre-processing of Arabic text to remove these diacritical marks before indexing. Consequently, there are noticeable discrepancies when searching the web for Arabic text using international search engines such as Google and yahoo. In this article, we propose a framework to enhance the retrieval effectiveness of search engines to search for diacritic and diacritic-less Arabic text through query expansion techniques. We used a rule-based stemmer and a semantic relational database compiled in an experimental thesaurus to do the expansion. We tested our approach on the scripts of the Quran. We found that query expansion for searching Arabic text is promising and it is likely that the efficiency can be further improved by advanced natural language processing tools.