A parallel algorithm for pattern discovery in biological sequences

Authors:
Giancarlo Mauri;Giulio Pavesi
Affiliations:
Department of Computer Science, Systems and Communication, University of Milan-Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milan, Italy;Department of Computer Science, Systems and Communication, University of Milan-Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milan, Italy
Venue:
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
Year:
2002

Citing 2
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software has become as essential to molecular biologists as the Bunsen burner was a few decades ago. Biological data come mainly in the form of DNA or protein sequences, i.e., strings over alphabets of four or 20 symbols, respectively. The main challenge now is to develop efficient and powerful algorithms to extract as much meaning as possible from the huge amount of data generated in the last few years. In this paper we present a parallel pattern discovery algorithm that given a set of functionally related sequences finds the substrings that occur in all (or most of) the sequences of the set. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substrings.