Accenting unknown words in a specialized language

  • Authors:
  • Pierre Zweigenbaum;Natalia Grabar

  • Affiliations:
  • Université Paris;Université Paris

  • Venue:
  • BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose two internal methods for accenting unknown words, which both learn on a reference set of accented words the contexts of occurrence of the various accented forms of a given letter. One method is adapted from POS tagging, the other is based on finite state transducers.We show experimental results for letter e on the French version of the Medical Subject Headings thesaurus. With the best training set, the tagging method obtains a precision-recall breakeven point of 84.2±4.4% and the transducer method 83.8±4.5% (with a baseline at 64%) for the unknown words that contain this letter. A consensus combination of both increases precision to 92.0±3.7% with a recall of 75%. We perform an error analysis and discuss further steps that might help improve over the current performance.