Principles of multivariate analysis: a user's perspective
Principles of multivariate analysis: a user's perspective
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical and neural classifiers: an integrated approach to design
Statistical and neural classifiers: an integrated approach to design
Predicting protein folding pathways
Bioinformatics
Non-parametric classification of protein secondary structures
Computers in Biology and Medicine
Computers in Biology and Medicine
Hi-index | 0.00 |
A methodology for testing the correlation between the sequence and structure distances of proteins is proposed. Structure distances were derived by applying a forward growing classification tree algorithm on defined physico-chemical and geometrical properties of the structures. The structure distance for every pair of proteins was defined as the number of intermediate nodes in the tree. Sequence distances were derived using pairwise sequence alignment. Then, correlation between sequence distance matrix and sequence distance matrix was tested using a Monte Carlo permutation test. The results were compared to those when the double dynamic structure alignment method (SSAP) was applied. The methodology was applied to a data set of 74 proteins belonging to 14 families. The classification tree was able to identify the protein families (the misclassification rate was R=1.4%) and a 74x74 structure distance matrix was produced. For every pair of protein sequences a dissimilarity score was recorded and a sequence distance matrix was produced. The Monte Carlo permutation produced a correlation coefficient r=0.403 (P