The nature of statistical learning theory
The nature of statistical learning theory
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Detecting errors within a corpus using anomaly detection
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Detecting errors in corpora using support vector machines
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Training Data Cleaning for Text Classification
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Collaborative data cleaning for sentiment classification with noisy training corpus
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Reducing the need for double annotation
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
A utility-theoretic ranking method for semi-automated text classification
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Improving Text Classification Accuracy by Training Label Cleaning
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
We address the problem dealing with category annotation errors which deteriorate the overall performance of text classification. We use two techniques. The first is support vectors which are extracted from the training samples by a machine learning technique, Support Vector Machines (SVM). The second is a loss function which measures the degree of our disappointment in any differences between the true distribution over inputs and the learner's prediction. We apply it to the extracted support vectors, and correct annotation errors. Experimental results with the RWCP and the Reuters 1996 corpora show that our method achieves high precision in detecting and correcting annotation errors. Further, results on text classification improves accuracy.