Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Structural ambiguity and lexical relations
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
Two languages are more informative than one
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Corpus statistics meet the noun compound: some empirical results
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic processing of large corpora for the resolution of anaphora references
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Web-based models for natural language processing
ACM Transactions on Speech and Language Processing (TSLP)
Bootstrapping parsers via syntactic projection across parallel texts
Natural Language Engineering
Experiments in parallel-text based grammar induction
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using the web as an implicit training set: application to structural ambiguity resolution
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Unsupervised Method for Parsing Coordinated Base Noun Phrases
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Linguistically motivated large-scale NLP with C&C and boxer
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Two languages are better than one (for syntactic parsing)
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised multilingual grammar induction
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Bilingually-constrained (monolingual) shift-reduce parsing
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Bitext-based resolution of German subject-object ambiguities
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning better monolingual models with unannotated bilingual text
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Using web-scale N-grams to improve base NP parsing performance
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
How many multiword expressions do people know?
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Discovering factions in the computational linguistics community
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Attacking parsing bottlenecks with unlabeled data and relevant factorizations
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
How many multiword expressions do people know?
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Hi-index | 0.00 |
Resolving coordination ambiguity is a classic hard problem. This paper looks at co-ordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don't do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Europarl), and (3) unannotated monolingual (e.g. Google N-grams). Size matters: (1) is a million words, (2) is potentially billions of words and (3) is potentially trillions of words. The unannotated monolingual data is helpful when the ambiguity can be resolved through associations among the lexical items. The bilingual data is helpful when the ambiguity can be resolved by the order of words in the translation. We train separate classifiers with monolingual and bilingual features and iteratively improve them via co-training. The co-trained classifier achieves close to 96% accuracy on Treebank data and makes 20% fewer errors than a supervised system trained with Treebank annotations.