Active Sampling for Class Probability Estimation and Ranking

Authors:
Maytal Saar-Tsechansky;Foster Provost
Affiliations:
Department of Management Science and Information Systems, Red McCombs School of Business, The University of Texas at Austin, Austin, Texas 78712, USA. maytal.saar-tsechansky@bus.utexas.edu ...;Department of Information Operations & Management Sciences, Leonard N. Stern School of Business, New York University, 44 West Fourth Street, New York, NY 10012, USA. fprovost@stern.nyu ...
Venue:
Machine Learning
Year:
2004

Citing 20
Cited 40

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning

C4.5: programs for machine learning
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Bagging predictors

Machine Learning
A simple, fast, and effective rule learner

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Active learning using adaptive resampling

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Queries and Concept Learning

Machine Learning
Experimental Goal Regression: A Method for Learning Problem-Solving Heuristics

Machine Learning
Queries and Concept Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Case against Accuracy Estimation for Comparing Induction Algorithms

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Query Learning Strategies Using Boosting and Bagging

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
Active learning with statistical models

Journal of Artificial Intelligence Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Using asymmetric distributions to improve text classifier probability estimates

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Noisy information value in utility-based decision making

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Developing adaptive auction mechanisms

ACM SIGecom Exchanges
Feature value acquisition in testing: a sequential batch test algorithm

ICML '06 Proceedings of the 23rd international conference on Machine learning
Adaptive mechanism design: a metalearning approach

ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
Online Random Shuffling of Large Database Tables

IEEE Transactions on Knowledge and Data Engineering
Active EM to reduce noise in activity recognition

Proceedings of the 12th international conference on Intelligent user interfaces
Entropy-Driven online active learning for interactive calendar management

Proceedings of the 12th international conference on Intelligent user interfaces
Active exploration for learning rankings from clickthrough data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Partial example acquisition in cost-sensitive learning

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Selectively acquiring ratings for product recommendation

Proceedings of the ninth international conference on Electronic commerce
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce

Proceedings of the ninth international conference on Electronic commerce
Active learning and logarithmic opinion pools for hpsg parse selection

Natural Language Engineering
A bayesian logistic regression model for active relevance feedback

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Active learning with direct query construction

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Pessimistic cost-sensitive active learning of decision trees for profit maximizing targeting campaigns

Data Mining and Knowledge Discovery
Guest editorial: special issue on utility-based data mining

Data Mining and Knowledge Discovery
Decision Tree Instability and Active Learning

ECML '07 Proceedings of the 18th European conference on Machine Learning
Active Feature-Value Acquisition

Management Science
Information Market-Based Decision Fusion

Management Science
Improving data mining utility with projective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
From active towards InterActive learning: using consideration information to improve labeling correctness

Proceedings of the ACM SIGKDD Workshop on Human Computation
A self-training approach to cost sensitive uncertainty sampling

Machine Learning
A machine learning approach to sentiment analysis in multilingual Web texts

Information Retrieval
Software testing by active learning for commercial games

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Active algorithm selection

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Actively exploring creation of face space(s) for improved face recognition

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Reflect and correct: A misclassification prediction approach to active inference

ACM Transactions on Knowledge Discovery from Data (TKDD)
Adaptive Auction Mechanism Design and the Incorporation of Prior Knowledge

INFORMS Journal on Computing
Asking generalized queries to ambiguous oracle

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Fast active exploration for link-based preference learning using Gaussian processes

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Improving Tree augmented Naive Bayes for class probability estimation

Knowledge-Based Systems
Active learning for probability estimation using jensen-shannon divergence

ECML'05 Proceedings of the 16th European conference on Machine Learning
DCPE co-training for classification

Neurocomputing
Not so greedy: Randomly Selected Naive Bayes

Expert Systems with Applications: An International Journal
Active surveying: a probabilistic approach for identifying key opinion leaders

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Intelligently querying incomplete instances for improving classification performance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Repeated labeling using multiple noisy labelers

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.