Robust ending guessing rules with application to Slavonic languages

Authors:
Preslav Nakov;Elena Paskaleva
Affiliations:
University of California, Berkeley, CA;Bulgarian Academy of Sciences, Sofia, Bulgaria
Venue:
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Year:
2004

Citing 11
Cited 0

Guessing morphology from terms and corpora

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Treatment of Unknown Words

WIA '99 Revised Papers from the 4th International Workshop on Automata Implementation
Coping with ambiguity and unknown words through probabilistic models

Computational Linguistics - Special issue on using large corpora: II
Automatic rule induction for unknown-word guessing

Computational Linguistics
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Memory-based morphological analysis

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Language independent, minimally supervised induction of lexical probabilities

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov Models

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Morphemes as necessary concept for structures discovery from untagged corpora

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper studies the automatic extraction of diagnostic word endings for Slavonic languages aimed to determine some grammatical, morphological and semantic properties of the underlying word. In particular, ending guessing rules are being learned from a large morphological dictionary of Bulgarian in order to predict POS, gender, number, article and semantics. A simple exact high accuracy algorithm is developed and compared to an approximate one, which uses a scoring function previously proposed by Mikheev for POS guessing. It is shown how the number of rules of the latter can be reduced by a factor of up to 35, without sacrificing performance. The evaluation demonstrates coverage close to 100%, and precision of 97--99% for the approximate algorithm.