Random Subspace Method in Text Categorization

Authors:
Mehrdad J. Gangeh;Mohamed S. Kamel;Robert P. W. Duin
Affiliations:
-;-;-
Venue:
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Year:
2010

Citing 0
Cited 1

Classification of driver fatigue expressions by combined curvelet features and gabor features, and random subspace ensembles of support vector machines

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In text categorization (TC), which is a supervised technique, a feature vector of terms or phrases is usually used to represent the documents. Due to the huge number of terms in even a moderate-size text corpus, high dimensional feature space is an intrinsic problem in TC. Random subspace method (RSM), a technique that divides the feature space to smaller ones each submitted to a (base) classifier (BC) in an ensemble, can be an effective approach to reduce the dimensionality of the feature space. Inspired by a similar research on functional magnetic resonance imaging (fMRI) of brain, here we address the estimation of ensemble parameters, i.e., the ensemble size (L) and the dimensionality of feature subsets (M) by defining three criteria: usability, coverage, and diversity of the ensemble. We will show that relatively medium M and small L yield an ensemble that improves the performance of a single support vector machine, which is considered as the state-of-the-art in TC.