A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Enhanced Good-Turing and Cat-Cal: two new methods for estimating probabilities of English bigrams
HLT '89 Proceedings of the workshop on Speech and Natural Language
Deducing linguistic structure from the statistics of large corpora
HLT '90 Proceedings of the workshop on Speech and Natural Language
Enhancing border security: Mutual information analysis to identify suspect vehicles
Decision Support Systems
Tree topological features for unlexicalized parsing
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Topics inference by weighted mutual information measures computed from structured corpus
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Stochastic modelling of scientific terms distribution in publications
MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
Suspect vehicle identification for border safety with modified mutual information
ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
A machine learning parser using an unlexicalized distituent model
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Building a hierarchical annotated corpus of urdu: the URDU.KON-TB treebank
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part of speech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser.