Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
IEEE Transactions on Knowledge and Data Engineering
Automated data-driven discovery of motif-based protein function classifiers
Information Sciences: an International Journal
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Feature Extraction in Spatially-Conserved Regions and Protein Functional Classification
FBIT '07 Proceedings of the 2007 Frontiers in the Convergence of Bioscience and Information Technologies
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Hi-index | 0.00 |
One of the important goals of bioinformatics is to classify and predict the functions of proteins that have no sequence homolog of known functions. The purpose of this paper is to classify protein function by using multi-parametric feature, without sequence similarity. Firstly, we propose a method for generating novel features that present various local information of protein sequence based on positively and negatively charged residues. Then, we introduce a process of making optimal feature subset through combination of traditional and novel features extracted from protein sequence. Finally, we classify ligase enzymes by support vector machine (SVM). In experiment, only 375 out of 483 features were selected by feature selection, and the classification accuracy for 4thsub-classes in Enzyme Commission (EC) number is 98.35%. Our results demonstrate that most of novel features are valuable for specific enzyme function classification.