Automatic acquisition of two-level morphological rules
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Hi-index | 0.00 |
We propose two internal methods for accenting unknown words, which both learn on a reference set of accented words the contexts of occurrence of the various accented forms of a given letter. One method is adapted from POS tagging, the other is based on finite state transducers.We show experimental results for letter e on the French version of the Medical Subject Headings thesaurus. With the best training set, the tagging method obtains a precision-recall breakeven point of 84.2±4.4% and the transducer method 83.8±4.5% (with a baseline at 64%) for the unknown words that contain this letter. A consensus combination of both increases precision to 92.0±3.7% with a recall of 75%. We perform an error analysis and discuss further steps that might help improve over the current performance.