Unsupervised dependency parsing without gold part-of-speech tags

Authors:
Valentin I. Spitkovsky;Hiyan Alshawi;Angel X. Chang;Daniel Jurafsky
Affiliations:
Stanford University, Stanford, CA, and Google Research, Google Inc., Mountain View, CA;Google Research, Google Inc., Mountain View, CA;Stanford University, Stanford, CA, and Google Research, Google Inc., Mountain View, CA;Stanford University, Stanford, CA, and Stanford University, Stanford, CA
Venue:
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Year:
2011

Citing 34
Cited 4

Class-based n-gram models of natural language

Computational Linguistics
Noise strategies for improving local search

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Two Experiments on Learning Probabilistic Dependency Grammars from Corpora

Two Experiments on Learning Probabilistic Dependency Grammars from Corpora
Discovery of linguistic relations using lexical attraction

Discovery of linguistic relations using lexical attraction
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Learning dependency translation models as collections of finite-state head transducers

Computational Linguistics - Special issue on finite-state methods in NLP
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Tagging English text with a probabilistic model

Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Statistical decision-tree models for parsing

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Recovering latent information in treebanks

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
The unsupervised learning of natural language structure

The unsupervised learning of natural language structure
Inducing syntactic categories by context distribution clustering

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Part of speech tagging in context

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Limitations of current grammar induction algorithms

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Evaluating unsupervised part-of-speech tagging for grammar induction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Unsupervised methods for head assignments

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improving unsupervised dependency parsing with richer contexts and smoothing

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joint parsing and named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
From baby steps to Leapfrog: how "Less is More" in unsupervised dependency parsing

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved unsupervised POS induction through prototype discovery

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Viterbi training improves unsupervised dependency parsing

CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Two decades of unsupervised POS induction: how far have we come?

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Improved fully unsupervised parsing with zoomed learning

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised part-of-speech tagging with bilingual graph-based projections

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Neutralizing linguistically problematic annotations in unsupervised dependency parsing evaluation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Deterministic statistical mapping of sentences to underspecified semantics

IWCS '11 Proceedings of the Ninth International Conference on Computational Semantics
Punctuation: making a point in unsupervised dependency parsing

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Advances in discriminative dependency parsing

Advances in discriminative dependency parsing
Multi-source transfer of delexicalized dependency parsers

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Concavity and initialization for unsupervised dependency parsing

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Fast unsupervised dependency parsing with arc-standard transitions

ROBUS-UNSUP '12 Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP
Capitalization cues improve dependency grammar induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Exploiting reducibility in unsupervised dependency parsing

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags --- requiring a word to always have the same part-of-speech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-of-the-art dependency grammar inducer achieves 59.1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus --- 0.7% higher than using gold tags.