Learning structural dependencies of words in the Zipfian tail

  • Authors:
  • Tejaswini Deoskar;Markos Mylonakis;Khalil Sima'an

  • Affiliations:
  • University of Edinburgh;ILLC University of Amsterdam;ILLC University of Amsterdam

  • Venue:
  • IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using semi-supervised EM, we learn finegrained but sparse lexical parameters of a generative parsing model (a PCFG) initially estimated over the Penn Treebank. Our lexical parameters employ supertags, which encode complex structural information at the pre-terminal level, and are particularly sparse in labeled data -- our goal is to learn these for words that are unseen or rare in the labeled data. In order to guide estimation from unlabeled data, we incorporate both structural and lexical priors from the labeled data. We get a large error reduction in parsing ambiguous structures associated with unseen verbs, the most important case of learning lexico-structural dependencies. We also obtain a statistically significant improvement in labeled bracketing score of the treebank PCFG, the first successful improvement via semi-supervised EM of a generative structured model already trained over large labeled data.