MULTEXT: Multilingual Text Tools and Corpora
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
The MULTEXT-east morphosyntactic specifications for Slavic languages
MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
OWL/DL formalization of the MULTEXT-East morphosyntactic specifications
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
MULTEXT-East: morphosyntactic resources for Central and Eastern European languages
Language Resources and Evaluation
Hi-index | 0.00 |
Farsi, also known as Persian, is the official language of Iran, Tajikistan and one of the two main languages spoken in Afghanistan. It is an Indo-European agglutinating language, written in Arabic script. This paper presents the first step in creating Farsi basic language resources kit. This Step comprises the specifications for morphosyntactic encoding, which is based on the EAGLES/MULTEXT model and specific resources of MULTEXT-East. This paper introduces the language i.e. Farsi, with an emphasis on its writing system and morphological properties, and its specifications. Two other important issues introduced in this paper are; one, a novel Part of Speech (PoS) categorization and, the other, a unified orthography of Farsi in digital environment. A lexicon and an annotated corpus are under preparation.