Using sequence compression to speedup probabilistic profile matching

  • Authors:
  • Valerio Freschi;Alessandro Bogliolo

  • Affiliations:
  • Information Science and Technology Institute, University of Urbino 61029 Urbino, Italy;Information Science and Technology Institute, University of Urbino 61029 Urbino, Italy

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P N. Results: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding. Contact: bogliolo@sti.uniurb.it