A computational theory of human linguistic processing: memory limitations and processing breakdown
A computational theory of human linguistic processing: memory limitations and processing breakdown
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
Scaling question answering to the Web
Proceedings of the 10th international conference on World Wide Web
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Active learning for statistical natural language parsing
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Intricacies of Collins' Parsing Model
Computational Linguistics
Sample Selection for Statistical Parsing
Computational Linguistics
An Expected Utility Approach to Active Feature-Value Acquisition
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Joint learning improves semantic role labeling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Syntactic complexity measures for detecting mild cognitive impairment
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
SPMT: statistical machine translation with syntactified target language phrases
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A two-stage method for active learning of statistical grammars
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Improved fully unsupervised parsing with zoomed learning
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
ULISSE: an unsupervised algorithm for detecting reliable dependency parses
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Hi-index | 0.00 |
Creating large amounts of manually annotated training data for statistical parsers imposes heavy cognitive load on the human annotator and is thus costly and error prone. It is hence of high importance to decrease the human efforts involved in creating training data without harming parser performance. For constituency parsers, these efforts are traditionally evaluated using the total number of constituents (TC) measure, assuming uniform cost for each annotated item. In this paper, we introduce novel measures that quantify aspects of the cognitive efforts of the human annotator that are not reflected by the TC measure, and show that they are well established in the psycholinguistic literature. We present a novel parameter based sample selection approach for creating good samples in terms of these measures. We describe methods for global optimisation of lexical parameters of the sample based on a novel optimisation problem, the constrained multiset multicover problem, and for cluster-based sampling according to syntactic parameters. Our methods outperform previously suggested methods in terms of the new measures, while maintaining similar TC performance.