Capitalization cues improve dependency grammar induction

  • Authors:
  • Valentin I. Spitkovsky;Hiyan Alshawi;Daniel Jurafsky

  • Affiliations:
  • Stanford University and Google Inc.;Google Inc., Mountain View, CA;Stanford University, Stanford, CA

  • Venue:
  • WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show that orthographic cues can be helpful for unsupervised parsing. In the Penn Treebank, transitions between upper- and lower-case tokens tend to align with the boundaries of base (English) noun phrases. Such signals can be used as partial bracketing constraints to train a grammar inducer: in our experiments, directed dependency accuracy increased by 2.2% (average over 14 languages having case information). Combining capitalization with punctuation-induced constraints in inference further improved parsing performance, attaining state-of-the-art levels for many languages.