A mixture-of-experts framework for text classification

Authors:
Andrew Estabrooks;Nathalie Japkowicz
Affiliations:
IBM Toronto Lab, North York, Ontario, Canada;University of Ottawa, Ottawa, Ontario, Canada
Venue:
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Year:
2001

Citing 6
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Feature Engineering for Text Classification

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Classification of seismic signals by integrating ensembles ofneural networks

IEEE Transactions on Signal Processing

A fast subspace text categorization method using parallel classifiers

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the particular characteristics of text classification tasks is that they present large class imbalances. Such a problem can easily be tackled using re-sampling methods. However, although these approaches are very simple to implement, tuning them most effectively is not an easy task. In particular, it is unclear whether oversampling is more effective than undersampling and which oversampling or undersampling rate should be used. This paper presents a method for combining different expressions of the re-sampling approach in a mixture of experts framework. The proposed combination scheme is evaluated on a very imbalanced subset of the REUTERS-21578 text collection and is shown to be very effective on this domain.