Parallel Algorithms for the Analysis of Biological Sequences

  • Authors:
  • Giancarlo Mauri;Giulio Pavesi

  • Affiliations:
  • -;-

  • Venue:
  • PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.