Using N-grams for Arabic text searching

Authors:
Suleiman H. Mustafa;Qasem A. Al-Radaideh
Affiliations:
Department of Computer Information Systems, Yarmouk University, Irbid, Jordan;Department of Computer Information Systems, Yarmouk University, Irbid, Jordan
Venue:
Journal of the American Society for Information Science and Technology
Year:
2004

Citing 5
Cited 5

Effective text compression with simultaneous digram and trigram encoding

Journal of Information Science
Thesaurus construction

Information retrieval
Using n-grams for Korean text retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Computer programs for detecting and correcting spelling errors

Communications of the ACM
Probabilistic Retrieval of OCR Degraded Text Using N-Grams

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries

Character contiguity in N-gram-based word matching: the case for Arabic text searching

Information Processing and Management: an International Journal
A novel Arabic lemmatization algorithm

Proceedings of the second workshop on Analytics for noisy unstructured text data
Managing misspelled queries in IR applications

Information Processing and Management: an International Journal
Improving Arabic information retrieval system using N-gram method

WSEAS Transactions on Computers
Effect of ISRI stemming on similarity measure for arabic document clustering

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

N-grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It reports the results of using the N-gram approach for a corpus of thousands of distinct textual words drawn from a number of sources representing various disciplines. The results indicate that the digram method offers a better performance than trigram with respect to conflation precision and conflation recall ratios. In either case, the N-gram approach does not appear to provide an efficient conflation approach due to the peculiarities imposed by the Arabic infix structure that reduces the rate of correct N-gram matching.