Parsing a natural language using mutual information statistics

  • Authors:
  • David M. Magerman;Mitchell P. Marcus

  • Affiliations:
  • CIS Department, University of Pennsylvania, Philadelphia, PA;CIS Department, University of Pennsylvania, Philadelphia, PA

  • Venue:
  • AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 2
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part of speech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser.