Estimation of class membership probabilities in the document classification

Authors:
Kazuko Takahashi;Hiroya Takamura;Manabu Okumura
Affiliations:
Keiai University, Faculty of International Studies, Sakura, Japan;Tokyo Institute of Technology, Precision and Intelligence Laboratory, Yokohama, Japan;Tokyo Institute of Technology, Precision and Intelligence Laboratory, Yokohama, Japan
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 12
Cited 0

Akaike information criterion statistics

Akaike information criterion statistics
Pairwise classification and support vector machines

Advances in kernel methods
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Learning and making decisions when costs and probabilities are both unknown

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting good probabilities with supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Estimating class priors in domain adaptation for word sense disambiguation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Automatic occupation coding with combination of machine learning and hand-crafted rules

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method for estimating class membership probabilities of a predicted class, using classification scores not only for the predicted class but also for other classes in a document classification. Class membership probabilities are important in many applications in document classification, in which multiclass classification is often applied. In the proposed method, we first make an accuracy table by counting the number of correctly classified training samples in each range or cell of classification scores. We then apply smoothing methods such as a moving average method with coverage to the accuracy table. In order to determine the class membership probability of an unknown sample, we first calculate the classification scores of the sample, then find the range or cell that corresponds to the scores and output the values associated in the range or cell in the accuracy table. Through experiments on two different datasets with both Support Vector Machines and Naive Bayes classifiers, we empirically show that the use of multiple classification scores is effective in the estimation of class membership probabilities, and that the proposed smoothing methods for the accuracy table work quite well. We also show that the estimated class membership probabilities by the proposed method are useful in the detection of the misclassified samples.