On Complexity Measures for Biological Sequences

  • Authors:
  • Fei Nan;Donald Adjeroh

  • Affiliations:
  • West Virginia University;West Virginia University

  • Venue:
  • CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract. In this work, we perform an empirical study of different published measures of complexity for general sequences, to determine their effectiveness in dealing with biological sequences. By effectiveness, we refer to how closely the given complexity measure is able to identify known biologically relevant relationships, such as closeness on a phylogenic tree. In particular, we study three complexity measures, namely, the traditional Shanonýs entropy, linguistic complexity, and T-complexity. For each complexity measure, we construct the complexity profile for each sequence in our test set, and based on the profiles we compare the sequences using different performance measures based on: (i) the information theoretic divergence measure of relative entropy; (ii) apparent periodicity in the complexity profile; and (iii) correct phylogeny. The preliminary results show that the Tcomplexity was the least effective in identifying previously established known associations between the sequences in our test set. Shannonýs entropy and linguistic-complexity provided better results, with Shannonýs entropy having an upper hand.