An accuracy-enhanced light stemmer for arabic text

Authors:
Samhaa R. El-Beltagy;Ahmed Rafea
Affiliations:
Cairo University, Giza, Egypt;The American University in Cairo
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2010

Citing 24
Cited 1

Lexical analysis of inflected Arabic words using exhaustive search of an augmented transition network

Software—Practice & Experience
Method for evaluation of stemming algorithms based on error counting

Journal of the American Society for Information Science
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
On arabic search: improving the retrieval effectiveness via a light stemming approach

Proceedings of the eleventh international conference on Information and knowledge management
Automatic Language-Specific Stemming in Information Retrieval

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Arabic finite-state morphological analysis and generation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Broken plural detection for arabic information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Arabic Stemming Without A Root Dictionary

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Unsupervised learning of Arabic stemming using a parallel corpus

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Language model based arabic word segmentation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Building a shallow Arabic Morphological Analyzer in one day

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Ontology based annotation of text segments

Proceedings of the 2007 ACM symposium on Applied computing
A novel Arabic lemmatization algorithm

Proceedings of the second workshop on Analytics for noisy unstructured text data
Automatic acquisition of inflectional lexica for morphological normalisation

Information Processing and Management: an International Journal
Introduction to Information Retrieval

Introduction to Information Retrieval
KP-Miner: A keyphrase extraction system for English and Arabic documents

Information Systems
Towards an error-free Arabic stemming

Proceedings of the 2nd ACM workshop on Improving non english web searching
Ontology learning from domain specific web documents

International Journal of Metadata, Semantics and Ontologies
Ontology based Text Annotation --OnTeA

Proceedings of the 2007 conference on Information Modelling and Knowledge Bases XVIII
Automatic tagging of Arabic text: from raw text to base phrase chunks

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
The impact of morphological stemming on Arabic mention detection and coreference resolution

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Assessing the impact of stemming accuracy on information retrieval

PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Stemming arabic conjunctions and prepositions

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

A corpus based approach for the automatic creation of arabic broken plural dictionaries

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stemming is a key step in most text mining and information retrieval applications. Information extraction, semantic annotation, as well as ontology learning are but a few examples where using a stemmer is a must. While the use of light stemmers in Arabic texts has proven highly effective for the task of information retrieval, this class of stemmers falls short of providing the accuracy required by many text mining applications. This can be attributed to the fact that light stemmers employ a set of rules that they apply indiscriminately and that they do not address stemming of broken plurals at all, even though this class of plurals is very commonly used in Arabic texts. The goal of this work is to overcome these limitations. The evaluation of the work shows that it significantly improves stemming accuracy. It also shows that by improving stemming accuracy, tasks such as automatic annotation and keyphrase extraction can also be significantly improved.