Correcting category errors in text classification

Authors:
Fumiyo Fukumoto;Yoshimi Suzuki
Affiliations:
Univ. of Yamanashi;Univ. of Yamanashi
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 6
Cited 5

The nature of statistical learning theory

The nature of statistical learning theory
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Detecting errors within a corpus using anomaly detection

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Detecting errors in corpora using support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

Training Data Cleaning for Text Classification

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Collaborative data cleaning for sentiment classification with noisy training corpus

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Reducing the need for double annotation

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
A utility-theoretic ranking method for semi-automated text classification

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Improving Text Classification Accuracy by Training Label Cleaning

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem dealing with category annotation errors which deteriorate the overall performance of text classification. We use two techniques. The first is support vectors which are extracted from the training samples by a machine learning technique, Support Vector Machines (SVM). The second is a loss function which measures the degree of our disappointment in any differences between the true distribution over inputs and the learner's prediction. We apply it to the extracted support vectors, and correct annotation errors. Experimental results with the RWCP and the Reuters 1996 corpora show that our method achieves high precision in detecting and correcting annotation errors. Further, results on text classification improves accuracy.