Parallel Algorithms for the Analysis of Biological Sequences

Authors:
Giancarlo Mauri;Giulio Pavesi
Affiliations:
-;-
Venue:
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Year:
2001

Citing 4
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A fast string searching algorithm

Communications of the ACM
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over an alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.