Towards history-based grammars: using richer models for probabilistic parsing

Authors:
Ezra Black;Fred Jelinek;John Lafferty;David M. Magerman;Robert Mercer;Salim Roukos
Affiliations:
IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center;IBM T. J. Watson Research Center
Venue:
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Year:
1993

Citing 7
Cited 29

Structural ambiguity and lexical relations

HLT '90 Proceedings of the workshop on Speech and Natural Language
Generating a grammar for statistical training

HLT '90 Proceedings of the workshop on Speech and Natural Language
Deducing linguistic structure from the statistics of large corpora

HLT '90 Proceedings of the workshop on Speech and Natural Language
Poor estimates of context are worse than none

HLT '90 Proceedings of the workshop on Speech and Natural Language
Introduction to Formal Language Theory

Introduction to Formal Language Theory
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Automatic acquisition of subcategorization frames from untagged text

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics

Natural language learning

ACM Computing Surveys (CSUR)
Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Improving statistical language model performance with automatically generated word hierarchies

Computational Linguistics
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
DCG induction using MDL and parsed corpora

Learning language in logic
The Application of Semantic Classification Trees to Natural Language Understanding

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Theory of Stochastic Grammars

NLP '00 Proceedings of the Second International Conference on Natural Language Processing
Supertagging: an approach to almost parsing

Computational Linguistics
Learning probabilistic subcategorization preference by identifying case dependencies and optimal noun class generalization level

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Review of "Statistical language learning" by Eugene Charniak. The MIT Press 1993.

Computational Linguistics
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Minimizing manual annotation cost in supervised training from corpora

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Towards a more careful evaluation of broad coverage parsing systems

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A statistical theory of dependency syntax

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Neural network probability estimation for broad coverage parsing

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
The Penn Treebank: annotating predicate argument structure

HLT '94 Proceedings of the workshop on Human Language Technology
A maximum entropy model for prepositional phrase attachment

HLT '94 Proceedings of the workshop on Human Language Technology
Decision tree parsing using a hidden derivation model

HLT '94 Proceedings of the workshop on Human Language Technology
Automatic extraction of grammars from annotated text

HLT '94 Proceedings of the workshop on Human Language Technology
An SVM based voting algorithm with application to parse reranking

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Using LTAG based features in parse reranking

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Exploring the potential of intractable parsers

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Linguistic theory in statistical language learning

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
A maximum entropy approach to chinese grapheme-to-phoneme conversion

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
A statistical semantic parser that integrates syntax and semantics

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Learning to parse database queries using inductive logic programming

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
CuteForce: deep deterministic HPSG parsing

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a generative probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. This stands in contrast to the usual approach of further grammar tailoring via the usual linguistic introspection in the hope of generating the correct parse. In head-to-head tests against one of the best existing robust probabilistic parsing models, which we call P-CFG, the HBG model significantly outperforms P-CFG, increasing the parsing accuracy rate from 60% to 75%, a 37% reduction in error.