Neural networks and the bias/variance dilemma
Neural Computation
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Machine Learning
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
IR evaluation methods for retrieving highly relevant documents
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Information, Prediction, and Query by Committee
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Detecting errors within a corpus using anomaly detection
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
ACM Transactions on Asian Language Information Processing (TALIP)
Detecting errors in part-of-speech annotation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Detecting errors in corpora using support vector machines
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
IEEE Transactions on Pattern Analysis and Machine Intelligence
Large scale semi-supervised linear SVMs
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Correcting category errors in text classification
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
An algorithm for correcting mislabeled data
Intelligent Data Analysis
trNon-greedy active learning for text categorization using convex ansductive experimental design
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How Much Noise Is Too Much: A Study in Automatic Text Classification
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Training Data Cleaning for Text Classification
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Introduction to Semi-Supervised Learning
Introduction to Semi-Supervised Learning
Crowdsourcing document relevance assessment with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
An empirical evaluation of bagging and boosting
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
MP-Boost: a multiple-pivot boosting algorithm and its application to text categorization
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Automatic Training Data Cleaning for Text Classification
ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
Detecting and revising misclassifications using ILP
DS'05 Proceedings of the 8th international conference on Discovery Science
Boosting: Foundations and Algorithms
Boosting: Foundations and Algorithms
Hi-index | 0.00 |
In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting classifiers for a given amount of training effort. Both strategies have been actively investigated for TC in recent years. Much less research has been devoted to a third such strategy, training label cleaning (TLC), which consists in devising ranking functions that sort the original training examples in terms of how likely it is that the human annotator has mislabelled them. This provides a convenient means for the human annotator to revise the training set so as to improve its quality. Working in the context of boosting-based learning methods for multilabel classification we present three different techniques for performing TLC and, on three widely used TC benchmarks, evaluate them by their capability of spotting training documents that, for experimental reasons only, we have purposefully mislabelled. We also evaluate the degradation in classification effectiveness that these mislabelled texts bring about, and to what extent training label cleaning can prevent this degradation.