Non-parametric classification of protein secondary structures

Authors:
Elias Zintzaras;Nigel P. Brown;Axel Kowald
Affiliations:
Department of Biomathematics, University of Thessaly School of Medicine, Papakyriazi 22, Larisa 41222, Greece;Biomedical Informatics Unit, Imperial Cancer Research Fund, London, UK;Max Planck Institute for Molecular Genetics, Berlin, Germany
Venue:
Computers in Biology and Medicine
Year:
2006

Citing 2
Cited 6

Principles of multivariate analysis: a user's perspective

Principles of multivariate analysis: a user's perspective
The nature of statistical learning theory

The nature of statistical learning theory

DHLAS: A web-based information system for statistical genetic analysis of HLA population data

Computer Methods and Programs in Biomedicine
A tree-based decision rule for identifying profile groups of cases without predefined classes: application in diffuse large B-cell lymphomas

Computers in Biology and Medicine
Methods for optimizing the structure alphabet sequences of proteins

Computers in Biology and Medicine
Classification tree based protein structure distances for testing sequence-structure correlation

Computers in Biology and Medicine
Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data

Computers in Biology and Medicine
Improving protein secondary structure prediction using a multi-modal BP method

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Proteins were classified into their families using a classification tree method which is based on the coefficient of variations of physico-chemical and geometrical properties of the secondary structures of proteins. The tree method uses as splitting criterion the increase in purity when a node is split into two subnodes and the size of the tree is controlled by a threshold level for the improvement of the apparent misclassification rate (AMR) of the tree after each splitting step. The classification tree method seems effective in reproducing similar structural groupings as the method of dynamic programming. For comparison, we also used another two methods: neural networks and support vector machines. We could show that the presented classification tree method performs better in classifying proteins into their families. The presented algorithm might be suitable for a rapid preliminary classification of proteins into their corresponding families.