MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Text Classification with Evolving Label-Sets
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Two-level hierarchical combination method for text classification
Expert Systems with Applications: An International Journal
Automatic image semantic interpretation using social action and tagging data
Multimedia Tools and Applications
Assessor disagreement and text classifier accuracy
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Sentiment analysis on evolving social streams: how self-report imbalances can help
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
We introduce a Bayesian model, BayesANIL, that is capable of estimating uncertainties associated with the labeling process. Given a labeled or partially labeled training corpus of text documents, the model estimates the joint distribution of training documents and class labels by using a generalization of the Expectation Maximization algorithm. The estimates can be used in standard classification models to reduce error rates. Since uncertainties in the labeling are taken into account, the model provides an elegant mechanism to deal with noisy labels. We provide an intuitive modification to the EM iterations by re-estimating the empirical. distribution in order to reinforce feature values in unlabeled data and to reduce the influence of noisily labeled examples. Considerable improvement in the classification accuracies of two popular classification algorithms on standard labeled data-sets with and without artificially introduced noise, as well as in the presence and absence of unlabeled data, indicates that this may be a promising method to reduce the burden of manual labeling.