Elements of information theory
Elements of information theory
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Natural language parsing as statistical pattern recognition
Natural language parsing as statistical pattern recognition
Selective Sampling Using the Query by Committee Algorithm
Machine Learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Information Retrieval
Active Learning for Natural Language Parsing and Information Extraction
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Learning probabilistic lexicalized grammars for natural language processing
Learning probabilistic lexicalized grammars for natural language processing
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Selective sampling for example-based word sense disambiguation
Computational Linguistics
Bagging and boosting a treebank parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A maximum-entropy-inspired parser
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Three generative, lexicalised models for statistical parsing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Statistical models for unsupervised prepositional phrase attachment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An empirical evaluation of Probabilistic Lexicalized Tree Insertion Grammars
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Minimizing manual annotation cost in supervised training from corpora
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Inside-outside reestimation from partially bracketed corpora
ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
A rule-based approach to prepositional phrase attachment disambiguation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Bootstrapping statistical parsers from small datasets
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Active learning for statistical natural language parsing
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Applying co-training methods to statistical parsing
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Example selection for bootstrapping statistical parsers
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample selection for statistical grammar induction
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
On minimizing training corpus for parser acquisition
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Bootstrapping parsers via syntactic projection across parallel texts
Natural Language Engineering
Analysis of selective strategies to build a dependency-analyzed corpus
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Active learning for logistic regression: an evaluation
Machine Learning
The bootstrapping of the Yarowsky algorithm in real corpora
Information Processing and Management: an International Journal
Adapting svm for data sparseness and imbalance: A case study in information extraction
Natural Language Engineering
Assessing the costs of sampling methods in active learning for annotation
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Proactive learning for building machine translation systems for minority languages
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Sample selection for statistical parsers: cognitively driven algorithms and evaluation measures
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Reducing class imbalance during active learning for named entity annotation
Proceedings of the fifth international conference on Knowledge capture
Example-based metonymy recognition for proper nouns
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
An analysis of active learning strategies for sequence labeling tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active Zipfian sampling for statistical parser training
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Using language modeling to select useful annotation data
SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Using smaller constituents rather than sentences in active learning for Japanese dependency parsing
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Active semi-supervised learning for improving word alignment
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Discriminative sample selection for statistical machine translation
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Improved fully unsupervised parsing with zoomed learning
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Bringing active learning to life
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Products of weighted logic programs
Theory and Practice of Logic Programming
Evaluating the impact of coder errors on active learning
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Effective measures of domain similarity for parsing
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Ask me better questions: active learning queries based on rule induction
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Active learning for dependency parsing using partially annotated sentences
IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
Semi-supervised dependency parsing using lexical affinities
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Improved parsing and POS tagging using inter-sentence consistency constraints
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
An information-theoretic measure to evaluate parsing difficulty across treebanks
ACM Transactions on Speech and Language Processing (TSLP)
Actively soliciting feedback for query answers in keyword search-based data integration
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a helpful training example. Experiments are performed across two syntactic learning tasks and within the single task of parsing across two learning models to compare the effect of different predictive criteria. We find that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.