Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text clustering with extended user feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning from labeled features using generalized expectation criteria
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A simple and efficient sampling method for estimating AP and NDCG
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Learning from measurements in exponential families
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Interactive feature space construction using semantic information
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Corrective feedback and persistent learning for information extraction
Artificial Intelligence
Active learning by labeling features
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Posterior Regularization for Structured Latent Variable Models
The Journal of Machine Learning Research
Online stratified sampling: evaluating classifiers at web-scale
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Generalized expectation criteria for lightly supervised learning
Generalized expectation criteria for lightly supervised learning
Hi-index | 0.00 |
Machine learning often relies on costly labeled data, and this impedes its application to new classification and information extraction problems. This has motivated the development of methods for leveraging abundant prior knowledge about these problems, including methods for lightly supervised learning using model expectation constraints. Building on this work, we envision an interactive training paradigm in which practitioners perform evaluation, analyze errors, and provide and refine expectation constraints in a closed loop. In this paper, we focus on several key subproblems in this paradigm that can be cast as selecting a representative sample of the unlabeled data for the practitioner to inspect. To address these problems, we propose stratified sampling methods that use model expectations as a proxy for latent output variables. In classification and sequence labeling experiments, these sampling strategies reduce accuracy evaluation effort by as much as 53%, provide more reliable estimates of $F_1$ for rare labels, and aid in the specification and refinement of constraints.