Lexicalization in crosslinguistic probabilistic parsing: the case of French

  • Authors:
  • Abhishek Arun;Frank Keller

  • Affiliations:
  • University of Edinburgh, Edinburgh, UK;University of Edinburgh, Edinburgh, UK

  • Venue:
  • ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins' Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The bigram model achieves the best performance: 81% constituency F-score and 84% dependency accuracy. All lexicalized models outperform the unlexicalized baseline, consistent with probabilistic parsing results for English, but contrary to results for German, where lexicalization has only a limited effect on parsing performance.