The nature of statistical learning theory
The nature of statistical learning theory
On the use of the singular value decomposition for text retrieval
Computational information retrieval
An introduction to variable and feature selection
The Journal of Machine Learning Research
Hi-index | 0.00 |
We use the n-grams descriptors for a protein classification task. As they are automatically generated, we obtain many irrelevant and/or redundant descriptors. In this paper, we evaluate various strategies of feature selection and feature reduction. First, we evaluate separately the efficiency of a filtering feature selection algorithm and a feature reduction on the basis of a singular value decomposition process (SVD). Then, we evaluate the combination of the two approaches i.e. we propose to use a very tolerant filter to select on a univariate basis which attributes to include in the subsequent SVD. We expect that the features extracted from relevant descriptors should allow to build a better classifier. We experiment the various approaches on two non-linear classifiers: a 3-nearest neighbor which is very sensitive to high dimensionality, and a SVM with a RBF kernel function which is well regularized. The results show that the behavior of the approaches depends mainly on the supervised learning characteristics.