Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Rigorous learning curve bounds from statistical mechanics
Machine Learning - Special issue on COLT '94
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic interpretation of precision, recall and F-score, with implication for evaluation
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Bibliography on estimation of misclassification
IEEE Transactions on Information Theory
Approximate Recall Confidence Intervals
ACM Transactions on Information Systems (TOIS)
Towards minimizing the annotation cost of certified text classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
It is common to develop and validate classifiers through a process of repeated testing, with nested training and/or test sets of increasing size. We demonstrate in this paper that such repeated testing leads to biased estimates of classifier effectiveness. Experiments on a range of text classification tasks under three sequential testing frameworks show all three lead to optimistic estimates of effectiveness. We calculate empirical adjustments to unbias estimates on our data set, and identify directions for research that could lead to general techniques for avoiding bias while reducing labeling costs.