Complexity profiles of DNA sequences using finite-context models

  • Authors:
  • Armando J. Pinho;Diogo Pratas;Sara P. Garcia

  • Affiliations:
  • Signal Processing Lab, IEETA / DETI, University of Aveiro, Aveiro, Portugal;Signal Processing Lab, IEETA / DETI, University of Aveiro, Aveiro, Portugal;Signal Processing Lab, IEETA / DETI, University of Aveiro, Aveiro, Portugal

  • Venue:
  • USAB'11 Proceedings of the 7th conference on Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society: information Quality in e-Health
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Every data compression method assumes a certain model of the information source that produces the data. When we improve a data compression method, we are also improving the model of the source. This happens because, when the probability distribution of the assumed source model is closer to the true probability distribution of the source, a smaller relative entropy results and, therefore, fewer redundancy bits are required. This is why the importance of data compression goes beyond the usual goal of reducing the storage space or the transmission time of the information. In fact, in some situations, seeking better models is the main aim. In our view, this is the case for DNA sequence data. In this paper, we give hints on how finite-context (Markov) modeling may be used for DNA sequence analysis, through the construction of complexity profiles of the sequences. These profiles are able to unveil structures of the DNA, some of them with potential biological relevance.