A New Feature Selection Method for Improving the Precision of Diagnosing Abnormal Protein Sequences by Support Vector Machine and Vectorization Method

Authors:
Eun-Mi Kim;Jong-Cheol Jeong;Ho-Young Pae;Bae-Ho Lee
Affiliations:
Dept. of Computer Engineering, Chonnam National University, Republic of Korea;Dept.of Electrical Engineering & Computer Science, The University of Kansas, USA;Dept. of Computer Engineering, Chonnam National University, Republic of Korea;Dept. of Computer Engineering, Chonnam National University, Republic of Korea
Venue:
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part II
Year:
2007

Citing 5
Cited 0

Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
An Effective Machine Learning Algorithm using Momentum Scheduling

HIS '04 Proceedings of the Fourth International Conference on Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pattern recognition and classification problems are most popular issue in machine learning, and it seem that they meet their second golden age with bioinformatics. However, the dataset of bioinformatics has several distinctive characteristics compared to the data set in classical pattern recognition and classification research area. One of the most difficulties using this theory in bioinformatics is that raw data of DNA or protein sequences cannot be directly used as input data for machine learning because every sequence has different length of its own code sequences. Therefore, this paper introduces one of the methods to overcome this difficulty, and also argues that the capability of generalization in this method is very poor as showing simple experiments. Finally, this paper suggests different approach to select the fixed number of effective features by using Support Vector Machine, and noise whitening method. This paper also defines the criteria of this suggested method and shows that this method improves the precision of diagnosing abnormal protein sequences with experiment of classifying ovarian cancer data set.