Measuring the similarity of protein structures by means of the universal similarity metric

Authors:
N. Krasnogor;D. A. Pelta
Affiliations:
Automated Scheduling, Optimisation and Planning Group, University of Nottingham, Nottingham, NG8 1BB, UK;Department of Computer Science and Artificial Intelligence, E.T.S.I. Informatica, Universidad de Granada, 18071 Granada, Spain
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 19

Self Generating Metaheuristics in Bioinformatics: The Proteins Structure Comparison Case

Genetic Programming and Evolvable Machines
Information distance from a question to an answer

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Aligning sequences by minimum description length

EURASIP Journal on Bioinformatics and Systems Biology
Information shared by many objects

Proceedings of the 17th ACM conference on Information and knowledge management
A framework for developing optimization-based decision support systems

Expert Systems with Applications: An International Journal
New information distance measure and its application in question answering system

Journal of Computer Science and Technology
Protein Comparison by the Alignment of Fuzzy Energy Signatures

RSKT '09 Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology
Evaluating Protein Similarity from Coarse Structures

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A new dot plot-based algorithm for genomes sequences comparison: A preliminary study

Expert Systems with Applications: An International Journal
A fuzzy sets based generalization of contact maps for the overlap of protein structures

Fuzzy Sets and Systems
Designing a methodology to estimate complexity of protein structures

ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Protein structure alignment using maximum cliques and local search

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Information distance and its extensions

DS'11 Proceedings of the 14th international conference on Discovery science
Protein structure comparison based on a measure of information discrepancy

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
Information distance and its applications

CIAA'06 Proceedings of the 11th international conference on Implementation and Application of Automata
Impugning Randomness, Convincingly

Studia Logica
Classifying stem cell differentiation images by information distance

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Community evolution detection in time-evolving information networks

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Towards UCI+: A mindful repository design

Information Sciences: an International Journal

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient. Results: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alpha--beta, tim--barrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets. Availability: All the scripts and programs used for the preparation of this paper are available at http://www.cs.nott.ac.uk/~nxk/USM/protocol.html. In that web-page the reader will find a brief description on how to use the various scripts and programs. Supplementary information: The protein datasets used are collected in http://www.cs.nott.ac.uk/~nxk/USM/datasets.html. The calculated similarity values for the proteins used in this paper can be found in http://www.cs.nott.ac.uk/~nxk/USM/similar.html. The clustering of the dataset based on these similarity values can be found in http://www.cs.nott.ac.uk/~nxk/USM/clustering.html