A linguistic investigation into unsupervised DOP

Authors:
Rens Bod
Affiliations:
University of St Andrews, ILLC, University of Amsterdam
Venue:
CACLA '07 Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition
Year:
2007

Citing 8
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A DOP model for semantic interpretation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
The structure of shared forests in ambiguous parsing

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning auxiliary fronting with grammatical inference

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
A unified model of structural organization in language and music

Journal of Artificial Intelligence Research
Item-based constructions and the logical problem

PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition
Natural language grammar induction with a generative constituent-context model

Pattern Recognition

Darwinised data-oriented parsing: statistical NLP with added sex and death

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Unsupervised Data-Oriented Parsing models (U-DOP) represent a class of structure bootstrapping models that have achieved some of the best unsupervised parsing results in the literature. While U-DOP was originally proposed as an engineering approach to language learning (Bod 2005, 2006a), it turns out that the model has a number of properties that may also be of linguistic and cognitive interest. In this paper we will focus on the original U-DOP model proposed in Bod (2005) which computes the most probable tree from among the shortest derivations of sentences. We will show that this U-DOP model can learn both rule-based and exemplar-based aspects of language, ranging from agreement and movement phenomena to discontiguous contructions, provided that productive units of arbitrary size are allowed. We argue that our results suggest a rapprochement between nativism and empiricism.