Parsing a natural language using mutual information statistics

Authors:
David M. Magerman;Mitchell P. Marcus
Affiliations:
CIS Department, University of Pennsylvania, Philadelphia, PA;CIS Department, University of Pennsylvania, Philadelphia, PA
Venue:
AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 2
Year:
1990

Citing 3
Cited 8

A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Enhanced Good-Turing and Cat-Cal: two new methods for estimating probabilities of English bigrams

HLT '89 Proceedings of the workshop on Speech and Natural Language

Deducing linguistic structure from the statistics of large corpora

HLT '90 Proceedings of the workshop on Speech and Natural Language
Enhancing border security: Mutual information analysis to identify suspect vehicles

Decision Support Systems
Tree topological features for unlexicalized parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Topics inference by weighted mutual information measures computed from structured corpus

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Stochastic modelling of scientific terms distribution in publications

MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
Suspect vehicle identification for border safety with modified mutual information

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
A machine learning parser using an unlexicalized distituent model

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Building a hierarchical annotated corpus of urdu: the URDU.KON-TB treebank

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part of speech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser.