On Complexity Measures for Biological Sequences

Authors:
Fei Nan;Donald Adjeroh
Affiliations:
West Virginia University;West Virginia University
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 3
Cited 4

Elements of information theory

Elements of information theory
A compression algorithm for DNA sequences and its applications in genome comparison

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Physical complexity of symbolic sequences

Physica D

Evaluation of different complexity measures for signal detection in genome sequences

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
The effect of sequence complexity on the construction of protein-protein interaction networks

BI'10 Proceedings of the 2010 international conference on Brain informatics
Complexity profiles of DNA sequences using finite-context models

USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
Signal detection in genome sequences using complexity based features

Proceedings of the 12th International Workshop on Data Mining in Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract. In this work, we perform an empirical study of different published measures of complexity for general sequences, to determine their effectiveness in dealing with biological sequences. By effectiveness, we refer to how closely the given complexity measure is able to identify known biologically relevant relationships, such as closeness on a phylogenic tree. In particular, we study three complexity measures, namely, the traditional Shanonýs entropy, linguistic complexity, and T-complexity. For each complexity measure, we construct the complexity profile for each sequence in our test set, and based on the profiles we compare the sequences using different performance measures based on: (i) the information theoretic divergence measure of relative entropy; (ii) apparent periodicity in the complexity profile; and (iii) correct phylogeny. The preliminary results show that the Tcomplexity was the least effective in identifying previously established known associations between the sequences in our test set. Shannonýs entropy and linguistic-complexity provided better results, with Shannonýs entropy having an upper hand.