A framework for retrieving Arabic documents based on queries written in Arabic slang language

Authors:
Mohammed Q. Shatnawi;Muneer Bani Yassein;Reem Mahafza
Affiliations:
;;
Venue:
Journal of Information Science
Year:
2012

Citing 5
Cited 1

Information Storage and Retrieval Systems: Theory and Implementation

Information Storage and Retrieval Systems: Theory and Implementation
Empirical studies in strategies for Arabic retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Collection of U.S. Extremist Online Forums: A Web Mining Approach

HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences
Current Approaches in Arabic IR: A Survey

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information

Classical to slang conversion for retrieving Arabic documents using slang queries

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the widespread use of the internet, there are large amounts of information and documents available in several languages. The Arabic language is one of the available important languages in terms of its usage and structure. Search engines like Google and Yahoo support searching in Arabic, yet fail to get good results when slang terms are used in the query. There are difficulties associated with the Arabic language. The main goal of this research is to refine Arabic text-based searching by using Arabic slang terms in queries. This research proposed a framework to enable users to use their slang language in order to retrieve the relevant documents that have been posted in both forms - slang and classical. The framework is designed and implemented based on a context-free grammar that is used to map the user's slang queries to the equivalent classical ones. On a classical dataset, results showed a 3% improvement on the average values of precision, recall, and F-measure achieved using classical-based queries rather than slang-based ones. Using slang-based queries gives 13% improvement on the average values of the used measures on a slang dataset and 7% improvement on the average values of the used measures on a hybrid dataset.