Taking into account the differences between actively and passively acquired data: the case of active learning with support vector machines for imbalanced datasets

Authors:
Michael Bloodgood;K. Vijay-Shanker
Affiliations:
Johns Hopkins University, Baltimore, MD;University of Delaware, Newark, DE
Venue:
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Year:
2009

Citing 6
Cited 7

Applied statistics: a first course

Applied statistics: a first course
Making large-scale support vector machine learning practical

Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning on the border: active learning in imbalanced data classification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning

A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Reducing class imbalance during active learning for named entity annotation

Proceedings of the fifth international conference on Knowledge capture
Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Bucking the trend: large-scale cost-focused active learning for statistical machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using Mechanical Turk to build machine translation evaluation sets

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Inactive learning?: difficulties employing active learning in practice

ACM SIGKDD Explorations Newsletter
A new probabilistic active sample selection algorithm for class imbalance problem

International Journal of Knowledge Engineering and Soft Data Paradigms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Actively sampled data can have very different characteristics than passively sampled data. Therefore, it's promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind the proposed InitPA method for addressing imbalance is to base cost models during AL on an estimate of overall corpus imbalance computed via a small unbiased sample rather than the imbalance in the labeled training data, which is the leading method used during PL.