Empirical studies in strategies for Arabic retrieval

Authors:
Jinxi Xu;Alexander Fraser;Ralph Weischedel
Affiliations:
BBN Technologies, Cambridge, MA;USC/ISI, Marina del Rey, CA;BBN Technologies, Cambridge, MA
Venue:
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2002

Citing 12
Cited 17

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments in multilingual information retrieval using the SPIDER system

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Stemming methodologies over individual query words for an Arabic information retrieval system

Journal of the American Society for Information Science
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval as statistical translation

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating a probabilistic model for cross-lingual information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Arabic finite-state morphological analysis and generation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Should we translate the documents or the queries in cross-language information retrieval?

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Arabic morphological analysis techniques: a comprehensive survey

Journal of the American Society for Information Science and Technology
Dictionary-based techniques for cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Character contiguity in N-gram-based word matching: the case for Arabic text searching

Information Processing and Management: an International Journal
A translation model for sentence retrieval

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A novel Arabic lemmatization algorithm

Proceedings of the second workshop on Analytics for noisy unstructured text data
Towards an error-free Arabic stemming

Proceedings of the 2nd ACM workshop on Improving non english web searching
Classifying Amharic webnews

Information Retrieval
Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents

Information Retrieval
Adapting the JIRS Passage Retrieval System to the Arabic Language

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
The impact of morphological stemming on Arabic mention detection and coreference resolution

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Preliminary lexical framework for English-Arabic semantic resource construction

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Stemming the Qur'an

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Multilingual pseudo-relevance feedback: performance study of assisting languages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Improving Arabic information retrieval system using N-gram method

WSEAS Transactions on Computers
Matching meaning for cross-language information retrieval

Information Processing and Management: an International Journal
A framework for retrieving Arabic documents based on queries written in Arabic slang language

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Arabic corpus of nearly 400k documents with both monolingual and cross-lingual queries and relevance judgments has been a new enabler for empirical studies. Experimental results show that spelling normalization and stemming can significantly improve Arabic monolingual retrieval. Character tri-grams from stems improved retrieval modestly on the test corpus, but the improvement is not statistically significant. To further improve retrieval, we propose a novel thesaurus-based technique. Different from existing approaches to thesaurus-based retrieval, ours formulates word synonyms as probabilistic term translations that can be automatically derived from a parallel corpus. Retrieval results show that the thesaurus can significantly improve Arabic monolingual retrieval. For cross-lingual retrieval (CLIR), we found that spelling normalization and stemming have little impact.