A new statistical parser based on bigram lexical dependencies

Authors:
Michael John Collins
Affiliations:
University of Pennsylvania, Philadelphia, PA
Venue:
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Year:
1996

Citing 10
Cited 163

Self-organized language modeling for speech recognition

Readings in speech recognition
Procedure for quantitatively comparing the syntactic coverage of English grammars

HLT '91 Proceedings of the workshop on Speech and Natural Language
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
Structural ambiguity and lexical relations

Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Pearl: a probabilistic chart parser

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Decision tree parsing using a hidden derivation model

HLT '94 Proceedings of the workshop on Human Language Technology

Using Decision Trees to Construct a Practical Parser

Machine Learning - Special issue on natural language learning
Learning to Parse Natural Language with Maximum Entropy Models

Machine Learning - Special issue on natural language learning
Phrase recognition and expansion for short, precision-biased queries based on a query log

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The role of lexicalization and pruning for base noun phrase grammars

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Towards multidocument summarization by reformulation: progress and prospects

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
DCG induction using MDL and parsed corpora

Learning language in logic
Scaling question answering to the Web

Proceedings of the 10th international conference on World Wide Web
High performance question/answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Scaling question answering to the web

ACM Transactions on Information Systems (TOIS)
Logic-based machine learning

Logic-based artificial intelligence
Head-Transducer Models for Speech Translation and TheirAutomatic Acquisition from Bilingual Data

Machine Translation
Guest Editors' Introduction: Recent Advances in Natural Language Processing

IEEE Intelligent Systems
Answer Extraction in Technical Domains

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Rapid Prototyping of Domain-Specific Machine Translation Systems

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Wide Coverage Incremental Parsing by Learning Attachment Preferences

AI*IA 01 Proceedings of the 7th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Methods for precise named entity matching in digital collections

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Introduction to special issue on machine learning approaches to shallow parsing

The Journal of Machine Learning Research
PCFG models of linguistic tree representations

Computational Linguistics
Review of "Industrial parsing of software manuals" by Richard F. E. Sutcliffe, Heinz-Detlev Koth, and Annette McElligott. Editions Rodopi 1996.

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
Open-domain textual question answering techniques

Natural Language Engineering
Do all fragments count?

Natural Language Engineering
Parsing engineering and empirical robustness

Natural Language Engineering
Robustness beyond shallowness: incremental deep parsing

Natural Language Engineering
Discovery of inference rules for question-answering

Natural Language Engineering
A lightweight dependency analyzer for partial parsing

Natural Language Engineering
Acquisitions and applications of structure preference relations in Chinese

Natural Language Engineering
Evaluating two methods for Treebank grammar compaction

Natural Language Engineering
Analyzing dependencies of Japanese subordinate clauses based on statistics of scope embedding preference

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A novel use of statistical parsing to extract information from text

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A dependency-based method for evaluating broad-coverage parsers

Natural Language Engineering
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Learning probabilistic subcategorization preference by identifying case dependencies and optimal noun class generalization level

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Rapid porting of DUSTer to Hindi

ACM Transactions on Asian Language Information Processing (TALIP)
New models for improving supertag disambiguation

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Japanese dependency structure analysis based on maximum entropy models

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Memory-based learning: using similarity for smoothing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Learning parse and translation decisions from examples with rich context

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A structured language model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Constituent-based accent prediction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
How verb subcategorization frequencies are affected by corpus choice

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
General-to-specific model selection for subcategorization preference

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic acquisition of hierarchical transduction models for machine translation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Trigger-pair predictors in parsing and tagging

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Exploiting syntactic structure for language modeling

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Using decision trees to construct a practical parser

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Compacting the Penn Treebank grammar

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Probabilistic parsing and psychological plausibility

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A class-based probabilistic approach to structural disambiguation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Experiments with open-domain textual Question Answering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Committee-based decision making in probabilistic partial parsing

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A stochastic parser based on a structural word prediction model

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Three new probabilistic models for dependency parsing: an exploration

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A method for accelerating CFG-parsing by using dependency information

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Identifying temporal expression and its syntactic role using FST and lexical data from corpus

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Local context templates for Chinese constituent boundary prediction

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Structural disambiguation of morpho-syntactic categorial parsing for Korean

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Dependence language model for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Analysis system of speech acts and discourse structures using maximum entropy model

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A statistical parser for Czech

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Information fusion in the context of multi-document summarization

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Introduction to the special issue on statistical language modeling

ACM Transactions on Asian Language Information Processing (TALIP)
An efficient implementation of a new DOP model

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Learning PP attachment for filtering prosodic phrasing

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Syntactic features for high precision word sense disambiguation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Can subcategorization help a statistical dependency parser?

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Open-domain voice-activated question answering

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
What is the minimal set of fragments that achieves maximal parse accuracy?

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
The role of lexico-semantic feedback in open-domain textual question-answering

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Building deep dependency structures with a wide-coverage CCG parser

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shallow parsing on the basis of words only: a case study

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Edit detection and parsing for transcribed speech

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A structured language model based on context-sensitive probabilistic left-corner parsing

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised learning of dependency structure for language modeling

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Statistical parsing with an automatically-extracted tree adjoining grammar

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An information-theory-based feature type analysis for the modelling of statistical parsing

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
A Neural Syntactic Language Model

Machine Learning
Intricacies of Collins' Parsing Model

Computational Linguistics
Dependency Parsing with an Extended Finite-State Approach

Computational Linguistics
Head-Driven Statistical Models for Natural Language Parsing

Computational Linguistics
Briefly Noted

Computational Linguistics
Overfitting avoidance for stochastic modeling of attribute-value grammars

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Verb subcategorization frequency differences between business-news and balanced corpora: the role of verb sense

WCC '00 Proceedings of the workshop on Comparing corpora - Volume 9
Japanese dependency structure analysis based on support vector machines

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Exploring evidence for shallow parsing

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Answer mining from on-line documents

ODQA '01 Proceedings of the workshop on Open-domain question answering - Volume 12
A transformational-based learner for dependency grammars in discharge summaries

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
The SuperARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Algorithms that learn to extract information: BBN: TIPSTER phase III

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Overview of the University of Pennsylvania's TIPSTER project: University of Pennsylvania

TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
A model of syntactic disambiguation based on lexicalized grammars

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Towards a base noun phrase parser using Web counts

Journal of Computing Sciences in Colleges
Evaluating State-of-the-Art Treebank-style Parsers for Coh-Metrix and Other Learning Technology Environments

Natural Language Engineering
Automated extraction of tags from the penn treebank

New developments in parsing technology
Shallow NLP techniques for internet search

ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

ACM Transactions on Information Systems (TOIS)
Building and Using a Lexical Knowledge Base of Near-Synonym Differences

Computational Linguistics
Mining Generalized Associations of Semantic Relations from Textual Web Content

IEEE Transactions on Knowledge and Data Engineering
Parsing the WSJ using CCG and log-linear models

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Alternative approaches for generating bodies of grammar rules

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatic measurement of syntactic development in child language

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Dependency parsing of Japanese spoken monologue based on clause boundaries

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Linear-time dependency analysis for Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Deterministic dependency parsing of English text

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Question answering based on semantic structures

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Dependency structure analysis and sentence boundary detection in spontaneous Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Analyzing models for semantic role assignment using confusability

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Using lexical dependency and ontological knowledge to improve a detailed syntactic and semantic tagger of English

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Speeding up full syntactic parsing by leveraging partial parsing decisions

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Exploring the potential of intractable parsers

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction

Information Processing and Management: an International Journal
Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

ACM Transactions on Asian Language Information Processing (TALIP)
Sequential dependency analysis for online spontaneous speech processing

Speech Communication
Improved Processing of Textual Use Cases: Deriving Behavior Specifications

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Dependency parsing of turkish

Computational Linguistics
Ontology Construction Based on Latent Topic Extraction in a Digital Library

ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Stochastic Parsing and Evolutionary Algorithms

Applied Artificial Intelligence
Unsupervised Method for Parsing Coordinated Base Noun Phrases

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Question answering based on pervasive agent ontology and Semantic Web

Knowledge-Based Systems
QUESTION ANSWERING USING QUESTION CLASSIFICATION AND DOCUMENT TAGGING

Applied Artificial Intelligence
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
CogNIAC: high precision coreference with limited knowledge and linguistic resources

ANARESOLUTION '97 Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts
Chinese dependency parsing with large scale automatically constructed case structures

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Linguistic theory in statistical language learning

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Verb subcategorization frequency differences between business-news and balanced corpora: the role of verb sense

CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Improving grammaticality in statistical sentence generation: introducing a dependency spanning tree algorithm with an argument satisfaction model

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Learning phrasal categories

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Better informed training of latent syntactic features

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Studying feature generation from various data representations for answer extraction

FeatureEng '05 Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing
Robust models of human parsing

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
A robust and hybrid deep-linguistic theory applied to large-scale parsing

ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Phrase-based and deep syntactic English-to-Czech statistical machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Factored A* search for models over sequences and trees

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Intrinsic versus extrinsic evaluations of parsing systems

Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?
Head-driven PCFGs with latent-head statistics

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Strictly lexical dependency parsing

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Moving from Requirements to Design Confronting Security Issues: A Case Study

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Question answering system based on ontology and semantic web

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Dependency parsing and projection based on word-pair classification

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A cognitive interactionist sentence parser with simple recurrent networks

Information Sciences: an International Journal
Statistical parsing with a context-free grammar and word statistics

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Exploring variations across biomedical subdomains

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Spanning tree approaches for statistical sentence generation

Empirical methods in natural language generation
Mining protein interactions from text using convolution kernels

PAKDD'09 Proceedings of the 13th Pacific-Asia international conference on Knowledge discovery and data mining: new frontiers in applied data mining
Head-modifier relation based non-lexical reordering model for phrase-based translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Dynamic programming algorithms for transition-based dependency parsers

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Improving dependency parsing with semantic classes

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Applying semantic-based probabilistic context-free grammar to medical language processing - A preliminary study on parsing medication sentences

Journal of Biomedical Informatics
Parsing noun phrases in the penn treebank

Computational Linguistics
Dependency parsing schemata and mildly non-projective dependency parsing

Computational Linguistics
Named entity tagging for korean using DL-CoTrain algorithm

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Long distance dependency in language modeling: an empirical study

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
BioNLP Shared Task 2011: supporting resources

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Inducing head-driven PCFGs with latent heads: refining a tree-bank grammar for parsing

ECML'05 Proceedings of the 16th European conference on Machine Learning
PLSI utilization for automatic thesaurus construction

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Exploring syntactic relation patterns for question answering

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Rediscovering ACL discoveries through the lens of ACL anthology network citing sentences

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Syntax-aware phrase-based statistical machine translation: system description

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
The ACL anthology network corpus

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a new statistical parser which is based on probabilities of dependencies between head-words in the parse tree. Standard bigram probability estimation techniques are extended to calculate probabilities of dependencies between pairs of words. Tests using Wall Street Journal data show that the method performs at least as well as SPATTER (Magerman 95; Jelinek et al. 94), which has the best published results for a statistical parser on this task. The simplicity of the approach means the model trains on 40,000 sentences in under 15 minutes. With a beam search strategy parsing speed can be improved to over 200 sentences a minute with negligible loss in accuracy.