Introduction to Automata Theory, Languages and Computability
Introduction to Automata Theory, Languages and Computability
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Proceedings of the ninth ACM SIGPLAN international conference on Functional programming
Unsupervised learning of morphology for building lexicon for a highly inflectional language
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Automatic acquisition of inflectional lexica for morphological normalisation
Information Processing and Management: an International Journal
Large-coverage root lexicon extraction for Hindi
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Smart paradigms and the predictability and complexity of inflectional morphology
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
The tool extract enables the automatic extraction of lemma-paradigm pairs from raw text data. The tool uses search patterns that consist of regular expressions and propositional logic. These search patterns define sufficient conditions for including lemma-paradigm pairs in the lexicon, on the basis of word forms occurring in the data. This paper explains the search pattern syntax of extract as well as the search algorithm, and discusses the design of search patterns from the recall and precision point of view. The extract tool was developed for morphologies defined in the Functional Morphology tool [1], but it is usable for all systems that implement a word-and-paradigm description of a morphology. The usefulness of the tool is demonstrated by a case study on the Canadian Hansards Corpus of French. The result is evaluated in terms of precision of the extracted lemmas and statistics on coverage and rule productiveness. Competitive extraction figures show that human-written rules in a tailored tool is a time-efficient approach to the task at hand.