A systematic comparison of various statistical alignment models
Computational Linguistics
Ultraconservative online algorithms for multiclass problems
The Journal of Machine Learning Research
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
The Penn Treebank: annotating predicate argument structure
HLT '94 Proceedings of the workshop on Human Language Technology
Bootstrapping parsers via syntactic projection across parallel texts
Natural Language Engineering
Corpus-based induction of syntactic structure: models of dependency and constituency
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Annealing techniques for unsupervised statistical language learning
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Pseudo-projective dependency parsing
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Contrastive estimation: training log-linear models on unlabeled data
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Optimal constituent alignment with edge covers for semantic projection
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Non-projective dependency parsing using spanning tree algorithms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Partial training for a lexicalized-grammar parser
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Multilingual dependency analysis with a two-stage discriminative parser
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Labeled pseudo-projective dependency parsing with support vector machines
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Semi-supervised dependency parsing using generalized tri-training
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Notes on the evaluation of dependency parsers obtained through cross-lingual projection
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Data point selection for cross-language adaptation of dependency parsers
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Hi-index | 0.00 |
We present a simple but very effective approach to identifying high-quality data in noisy data sets for structured problems like parsing, by greedily exploiting partial structures. We analyze our approach in an annotation projection framework for dependency trees, and show how dependency parsers from two different paradigms (graph-based and transition-based) can be trained on the resulting tree fragments. We train parsers for Dutch to evaluate our method and to investigate to which degree graph-based and transition-based parsers can benefit from incomplete training data. We find that partial correspondence projection gives rise to parsers that out-perform parsers trained on aggressively filtered data sets, and achieve unlabeled attachment scores that are only 5% behind the average UAS for Dutch in the CoNLL-X Shared Task on supervised parsing (Buchholz and Marsi, 2006).