Accenting unknown words in a specialized language

Authors:
Pierre Zweigenbaum;Natalia Grabar
Affiliations:
Université Paris;Université Paris
Venue:
BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Year:
2002

Citing 2
Cited 0

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Automatic acquisition of two-level morphological rules

ANLC '97 Proceedings of the fifth conference on Applied natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose two internal methods for accenting unknown words, which both learn on a reference set of accented words the contexts of occurrence of the various accented forms of a given letter. One method is adapted from POS tagging, the other is based on finite state transducers.We show experimental results for letter e on the French version of the Medical Subject Headings thesaurus. With the best training set, the tagging method obtains a precision-recall breakeven point of 84.2±4.4% and the transducer method 83.8±4.5% (with a baseline at 64%) for the unknown words that contain this letter. A consensus combination of both increases precision to 92.0±3.7% with a recall of 75%. We perform an error analysis and discuss further steps that might help improve over the current performance.