Punctuation: making a point in unsupervised dependency parsing

Authors:
Valentin I. Spitkovsky;Hiyan Alshawi;Daniel Jurafsky
Affiliations:
Stanford University and Google Inc.;Google Inc., Mountain View, CA;Stanford University, Stanford, CA
Venue:
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Year:
2011

Citing 35
Cited 7

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Robust probabilistic predictive syntactic processing: motivations, models, and applications

Robust probabilistic predictive syntactic processing: motivations, models, and applications
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
PCFG models of linguistic tree representations

Computational Linguistics
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A sentence analysis method for a Japanese book reading machine for the blind

ACL '86 Proceedings of the 24th annual meeting on Association for Computational Linguistics
Flexible parsing

ACL '80 Proceedings of the 18th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Exploring the role of punctuation in parsing natural text

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Text genre detection using common word frequencies

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Efficient parsing for bilexical context-free grammars and head automaton grammars

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Integer linear programming inference for conditional random fields

ICML '05 Proceedings of the 22nd international conference on Machine learning
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Parsing and disfluency placement

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Effective use of prosody in parsing conversational speech

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Restoring punctuation and capitalization in transcribed speech

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A more precise analysis of punctuation for broad-coverage surface realization with CCG

GEAF '08 Proceedings of the Workshop on Grammar Engineering Across Frameworks
Improving unsupervised dependency parsing with richer contexts and smoothing

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joint parsing and named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning and inference over constrained output

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Parsing with soft and hard constraints on dependency length

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Punctuation as implicit annotations for chinese word segmentation

Computational Linguistics
Semantic role chunking combining complementary syntactic views

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Painless unsupervised learning with features

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Profiting from mark-up: hyper-text annotations for guided parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Viterbi training improves unsupervised dependency parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Better punctuation prediction with dynamic conditional random fields

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised induction of tree substitution grammars for dependency parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
"Got you!": automatic vandalism detection in Wikipedia with web-based shallow syntactic-semantic modeling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Integrating punctuation rules and naïve bayesian model for chinese creation title recognition

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Reducing the size of the representation for the uDOP-estimate

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Unsupervised dependency parsing without gold part-of-speech tags

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Concavity and initialization for unsupervised dependency parsing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Fast unsupervised dependency parsing with arc-standard transitions

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Capitalization cues improve dependency grammar induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Exploiting reducibility in unsupervised dependency parsing

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Three dependency-and-boundary models for grammar induction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how punctuation can be used to improve unsupervised dependency parsing. Our linguistic analysis confirms the strong connection between English punctuation and phrase boundaries in the Penn Treebank. However, approaches that naively include punctuation marks in the grammar (as if they were words) do not perform well with Klein and Manning's Dependency Model with Valence (DMV). Instead, we split a sentence at punctuation and impose parsing restrictions over its fragments. Our grammar inducer is trained on the Wall Street Journal (WSJ) and achieves 59.5% accuracy out-of-domain (Brown sentences with 100 or fewer words), more than 6% higher than the previous best results. Further evaluation, using the 2006/7 CoNLL sets, reveals that punctuation aids grammar induction in 17 of 18 languages, for an overall average net gain of 1.3%. Some of this improvement is from training, but more than half is from parsing with induced constraints, in inference. Punctuation-aware decoding works with existing (even already-trained) parsing models and always increased accuracy in our experiments.