Ripple Down Rule learning for automated word lemmatisation

Authors:
Joël Plisson;Nada Lavrač;Dunja Mladenić;Tomaž Erjavec
Affiliations:
Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: joel.plisson@ijs.si;Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: joel.plisson@ijs.si and University of Nova Gorica, Vipavska 13, 5000 Nova Gorica, Slovenia;Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: joel.plisson@ijs.si;Jožef Stefan Institute, Jamova 39, 1000 Ljubljana, Slovenia. E-mail: joel.plisson@ijs.si
Venue:
AI Communications
Year:
2008

Citing 12
Cited 0

A philosophical basis for knowledge acquisition

Knowledge Acquisition
Learning Decision Lists

Machine Learning
The CN2 Induction Algorithm

Machine Learning
Learning word normalization using word suffix and context from unlabeled data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Inductive Logic Programming for Natural Language Processing

ILP '96 Selected Papers from the 6th International Workshop on Inductive Logic Programming
Learning Multilingual Morphology with CLOG

ILP '98 Proceedings of the 8th International Workshop on Inductive Logic Programming
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Memory-based morphological analysis

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
Memory-Based Learning of morphology with stochastic transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Induction of first-order decision lists: results on learning the past tense of English verbs

Journal of Artificial Intelligence Research
An analogical learner for morphological analysis

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lemmatisation is the process of finding the normalised forms of wordforms as they appear in text. It is a useful pre-processing step for a large number of language engineering tasks, and especially important for languages with rich inflection morphology. This paper presents a machine learning approach to automated word lemmatisation using a Ripple Down Rule learning algorithm, specially adapted to this task. By focusing on word suffixes, the induced Ripple Down Rules determine which wordform suffix should be removed and/or added to generate the lemma. The rules, induced from a lexicon of lemmatised Slovene words, were evaluated by cross-validation in the lexicon and on a hand-validated annotated corpus, and compared to previous work using two other inductive lemmatisers, ATRIS and CLOG. We show that RDR outperforms ATRIS and is more flexible than CLOG, as it can, unlike CLOG, also work without prior part-of-speech tagging. The RDR lemmatiser is easy to train and use for new languages and is, together with CLOG, available via a Web service.