Grammar induction by MDL-based distributional classification

  • Authors:
  • Yikun Guo;Fuliang Weng;Lide Wu

  • Affiliations:
  • Computer Science Department, Fudan University Shanghai, China;Research and Technology Center, Robert Bosch Corporation, Palo Alto, CA;Computer Science Department, Fudan University Shanghai, China

  • Venue:
  • New developments in parsing technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This chapter describes our grammar induction work using the Minimum Description Length (MDL) principle. We start with a diagnostic comparison between a basic best-first MDL induction algorithm and a pseudo induction process, which reveals problems associated with the existing MDL-based grammar induction approach. Based on this, we present a novel two-stage grammar induction algorithm which overcomes a local-minimum problem in the basic algorithm by clustering the left hand sides of the induced grammar rules with a seed grammar. Preliminary experimental results show that the resulting induction curve significantly outperforms traditional MDL-based grammar induction, and in a diagnostic comparison is very close to the ideal case. In addition, the new algorithm induces grammar rules with high precision. Finally, we discuss our future research directions to improve both the recall and precision of the algorithm.