Robust Classification for Imprecise Environments
Machine Learning
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ThemeRiver: Visualizing Thematic Changes in Large Document Collections
IEEE Transactions on Visualization and Computer Graphics
Using Error-Correcting Codes for Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Quantifying trends accurately despite classifier error and class imbalance
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Experimental perspectives on learning from imbalanced data
Proceedings of the 24th international conference on Machine learning
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
Counting positives accurately despite inaccurate classification
ECML'05 Proceedings of the 16th European conference on Machine Learning
Guest editorial: special issue on utility-based data mining
Data Mining and Knowledge Discovery
Quantification and semi-supervised classification methods for handling changes in class distribution
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Network quantification despite biased labels
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Estimating class proportions in boar semen analysis using the hellinger distance
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
Handling concept drift via ensemble and class distribution estimation technique
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Drift mining in data: A framework for addressing drift in classification
Computational Statistics & Data Analysis
Class distribution estimation based on the Hellinger distance
Information Sciences: an International Journal
Variable-constraint classification and quantification of radiology reports under the ACR Index
Expert Systems with Applications: An International Journal
WagTag: a dog collar accessory for monitoring canine activity levels
Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
The Journal of Machine Learning Research
Information Sciences: an International Journal
Aggregative quantification for regression
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Many business applications track changes over time, for example, measuring the monthly prevalence of influenza incidents. In situations where a classifier is needed to identify the relevant incidents, imperfect classification accuracy can cause substantial bias in estimating class prevalence. The paper defines two research challenges for machine learning. The `quantification' task is to accurately estimate the number of positive cases (or class distribution) in a test set, using a training set that may have a substantially different distribution. The `cost quantification' variant estimates the total cost associated with the positive class, where each case is tagged with a cost attribute, such as the expense to resolve the case. Quantification has a very different utility model from traditional classification research. For both forms of quantification, the paper describes a variety of methods and evaluates them with a suitable methodology, revealing which methods give reliable estimates when training data is scarce, the testing class distribution differs widely from training, and the positive class is rare, e.g., 1% positives. These strengths can make quantification practical for business use, even where classification accuracy is poor.