A POS-Based Word Prediction System for the Persian Language

Authors:
Masood Ghayoomi;Ehsan Daroodi
Affiliations:
Nancy 2 University, Nancy, France;Iran National Science Foundation, Tehran, Iran
Venue:
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Year:
2008

Citing 2
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word prediction is the problem of guessing the words which are likely to follow in a given text segment by displaying a list of the most probable words that could appear in that position. In this research, we designed and implemented three word predictors for Persian. Our baseline is a statistical-based system which uses language models. The first system uses word statistics; in the second one we use the main syntactic categories of a Persian POS tagged corpus; and the last one uses the main syntactic categories along with their morphological, syntactic and semantic subcategories. Using KeyStroke Saving (KSS) as the most important metrics to evaluate systems' performance, the primary word-based statistical system achieved 37% KSS, and the second system that used only the main syntactic categories with word-statistics achieved 38.95% KSS. Our last system which used all of the available information to the words get the best result by 42.45% KSS.