Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields

  • Authors:
  • Matthieu Constant;Joseph Le Roux;Anthony Sigogne

  • Affiliations:
  • Université Paris-Est, LIGM, CNRS;Université Paris-Nord, LIPN, CNRS;Université Paris-Est, LIGM, CNRS

  • Venue:
  • ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The integration of compounds in a parsing procedure has been shown to improve accuracy in an artificial context where such expressions have been perfectly preidentified. This article evaluates two empirical strategies to incorporate such multiword units in a real PCFG-LA parsing context: (1) the use of a grammar including compound recognition, thanks to specialized annotation schemes for compounds; (2) the use of a state-of-the-art discriminative compound prerecognizer integrating endogenous and exogenous features. We show how these two strategies can be combined with word lattices representing possible lexical analyses generated by the recognizer. The proposed systems display significant gains in terms of multiword recognition and often in terms of standard parsing accuracy. Moreover, we show through an Oracle analysis that this combined strategy opens promising new research directions.