MWU-aware part-of-speech tagging with a CRF model and lexical resources

  • Authors:
  • Matthieu Constant;Anthony Sigogne

  • Affiliations:
  • Université Paris-Est, LIGM, bd Descartes - Champs/Marne Marne-la-Vallée cedex, France;Université Paris-Est, LIGM, bd Descartes - Champs/Marne Marne-la-Vallée cedex, France

  • Venue:
  • MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new part-of-speech tagger including multiword unit (MWU) identification. It is based on a Conditional Random Field model integrating language-independent features, as well as features computed from external lexical resources. It was implemented in a finite-state framework composed of a preliminary finite-state lexical analysis and a CRF decoding using weighted finite-state transducer composition. We showed that our tagger reaches state-of-the-art results for French in the standard evaluation conditions (i.e. each multiword unit is already merged in a single token). The evaluation of the tagger integrating MWU recognition clearly shows the interest of incorporating features based on MWU resources.