Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
ThemeRiver: Visualizing Thematic Changes in Large Document Collections
IEEE Transactions on Visualization and Computer Graphics
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
KBA: Kernel Boundary Alignment Considering Imbalanced Data Distribution
IEEE Transactions on Knowledge and Data Engineering
Discovering evolutionary theme patterns from text: an exploration of temporal text mining
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Counting positives accurately despite inaccurate classification
ECML'05 Proceedings of the 16th European conference on Machine Learning
Tackling concept drift by temporal inductive transfer
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Pragmatic text mining: minimizing human effort to quantify many issues in call logs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Non-stationary data sequence classification using online class priors estimation
Pattern Recognition
Scaling up text classification for large file systems
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Quantifying counts and costs via classification
Data Mining and Knowledge Discovery
ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Quantifying the proportion of damaged sperm cells based on image analysis and neural networks
SMO'08 Proceedings of the 8th conference on Simulation, modelling and optimization
Transfer estimation of evolving class priors in data stream classification
Pattern Recognition
Class distribution estimation based on the Hellinger distance
Information Sciences: an International Journal
Aggregative quantification for regression
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
This paper promotes a new task for supervised machine learning research: quantification - the pursuit of learning methods for accurately estimating the class distribution of a test set, with no concern for predictions on individual cases. A variant for cost quantification addresses the need to total up costs according to categories predicted by imperfect classifiers. These tasks cover a large and important family of applications that measure trends over time.The paper establishes a research methodology, and uses it to evaluate several proposed methods that involve selecting the classification threshold in a way that would spoil the accuracy of individual classifications. In empirical tests, Median Sweep methods show outstanding ability to estimate the class distribution, despite wide disparity in testing and training conditions. The paper addresses shifting class priors and costs, but not concept drift in general.