Guessing morphology from terms and corpora
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Coping with ambiguity and unknown words through probabilistic models
Computational Linguistics - Special issue on using large corpora: II
Automatic rule induction for unknown-word guessing
Computational Linguistics
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Memory-based morphological analysis
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Minimally supervised morphological analysis by multimodal alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Language independent, minimally supervised induction of lexical probabilities
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov Models
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Morphemes as necessary concept for structures discovery from untagged corpora
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
The paper studies the automatic extraction of diagnostic word endings for Slavonic languages aimed to determine some grammatical, morphological and semantic properties of the underlying word. In particular, ending guessing rules are being learned from a large morphological dictionary of Bulgarian in order to predict POS, gender, number, article and semantics. A simple exact high accuracy algorithm is developed and compared to an approximate one, which uses a scoring function previously proposed by Mikheev for POS guessing. It is shown how the number of rules of the latter can be reduced by a factor of up to 35, without sacrificing performance. The evaluation demonstrates coverage close to 100%, and precision of 97--99% for the approximate algorithm.