Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Exploiting auxiliary distributions in stochastic unification-based grammars
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Some properties of preposition and subordinate conjunction attachments
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Learning random walk models for inducing word dependency distributions
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Lexicalization of probabilistic grammars
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A maximum entropy model for prepositional phrase attachment
HLT '94 Proceedings of the workshop on Human Language Technology
Intricacies of Collins' Parsing Model
Computational Linguistics
A lattice-based framework for enhancing statistical parsers with information from unlabeled corpora
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Statistical parsing with a context-free grammar and word statistics
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
The effect of corpus size on case frame acquisition for discourse analysis
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Hi-index | 0.00 |
We investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. The supervised component is Collins' parser, trained on the Wall Street Journal. The unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. We find that the combined system only improves the performance of the parser for small training sets. Surprisingly, the size of the unannotated corpus has little effect due to the noisiness of the lexical statistics acquired by unsupervised learning.