Two baselines for unsupervised dependency parsing

Authors:
Anders Søgaard
Affiliations:
University of Copenhagen, Copenhagen S
Venue:
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Year:
2012

Citing 6
Cited 1

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Sparsity in dependency grammar induction

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Using universal linguistic knowledge to guide grammar induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Neutralizing linguistically problematic annotations in unsupervised dependency parsing evaluation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised dependency parsing without training

Natural Language Engineering

The PASCAL Challenge on Grammar Induction

WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure

Quantified Score

Hi-index	0.00

Visualization

Abstract

Results in unsupervised dependency parsing are typically compared to branching baselines and the DMV-EM parser of Klein and Manning (2004). State-of-the-art results are now well beyond these baselines. This paper describes two simple, heuristic baselines that are much harder to beat: a simple, heuristic algorithm recently presented in Søgaard (2012) and a heuristic application of the universal rules presented in Naseem et al. (2010). Our first baseline (RANK) outperforms existing baselines, including PR-DVM (Gillenwater et al., 2010), while relying only on raw text, but all submitted systems in the Pascal Grammar Induction Challenge score better. Our second baseline (RULES), however, outperforms several submitted systems.