Machine Learning
A fast algorithm for the maximum clique problem
Discrete Applied Mathematics - Sixth Twente Workshop on Graphs and Combinatorial Optimization
Feature analysis and classification of protein secondary structure data
ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
Protein secondary structure prediction problem is one of the widely studied problems in bioinformatics. Predicting the secondary structure of a protein is an important step for determining its tertiary structure and thus its function. This paper explores the protein secondary structure problem using a novel feature selection algorithm combined with a machine learning approach based on random forests. For feature reduction, we propose an algorithm that uses a graph theoretical approach which finds cliques in the nonposition specific evolutionary profiles of proteins obtained from BLOSUM62. Then, the features selected by this algorithm are used for condensing the position specific evolutionary information obtained from PSI-BLAST. Our results show that we are able to save significant amount of space and time and still achieve high accuracy results even when the features of the data are 25% reduced.