Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents

Authors:
Bassam H. Hammo
Affiliations:
King Abdullah II School for Information Technology, University of Jordan, Amman, Jordan 11942
Venue:
Information Retrieval
Year:
2009

Citing 27
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Concept based query expansion

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Cross-linguistic information retrieval workshop

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A parallel relational database management system approach to relevance feedback in information retrieval

Journal of the American Society for Information Science
Exploiting a controlled vocabulary to improve collection selection and retrieval effectiveness

Proceedings of the tenth international conference on Information and knowledge management
Empirical studies in strategies for Arabic retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings

Information Retrieval
Evaluating Interactive Cross-Language Information Retrieval: Document Selection

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Cross language information retrieval: a research roadmap

ACM SIGIR Forum
On Arabic-English Cross-Language Information Retrieval: A Machine Translation Approach

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000–2002

Information Retrieval
Information retrieval using robust natural language processing

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
How do search engines respond to some non-English queries?

Journal of Information Science
Structured queries, language modeling, and relevance modeling in cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
An HMM approach to vowel restoration in Arabic and Hebrew

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
QARAB: a question answering system to support the Arabic language

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Maximum entropy based restoration of Arabic diacritics

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Corpus-based cross-language information retrieval in retrieval of highly relevant documents: Research Articles

Journal of the American Society for Information Science and Technology
Web retrieval systems and the Greek language: do they have an understanding?

Journal of Information Science
Modifying a natural language processing system for European languages to treat Arabic in information processing and information retrieval applications

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Arabic to French sentence alignment: exploration of a cross-language information retrieval approach

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Improved Arabic base phrase chunking with a new enriched POS tag set

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

Current research issues and trends in non-English Web searching

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The majority of Arabic text available on the web is written without short vowels (diacritics). Diacritics are commonly used in religious scripts such as the holy Quran (the book of Islam), Al-Hadith (the teachings of Prophet Mohammad (PBUH)), children's literature, and in some words where ambiguity of articulation might arise. Internet Arabic users might lose credible sources of Arabic text to be retrieved if they could not match the correct diacritical marks attached to the words in the collection. However, typing the diacritical marks is very annoying and time consuming. The other way around, is to ignore these marks and fall into the problem of ambiguity. Previous work suggested pre-processing of Arabic text to remove these diacritical marks before indexing. Consequently, there are noticeable discrepancies when searching the web for Arabic text using international search engines such as Google and yahoo. In this article, we propose a framework to enhance the retrieval effectiveness of search engines to search for diacritic and diacritic-less Arabic text through query expansion techniques. We used a rule-based stemmer and a semantic relational database compiled in an experimental thesaurus to do the expansion. We tested our approach on the scripts of the Quran. We found that query expansion for searching Arabic text is promising and it is likely that the efficiency can be further improved by advanced natural language processing tools.