Efficient distributed computation of maximal exact matches

Authors:
Mohamed Abouelhoda;Sondos Seif
Affiliations:
Faculty of Engineering, Cairo University, Giza, Egypenter for Informatics Sciences, Nile University, Giza, Egypt;Center for Informatics Sciences, Nile University, Giza, Egypt
Venue:
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Year:
2012

Citing 5
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The Enhanced Suffix Array and Its Applications to Genome Analysis

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
EMAGEN: an efficient approach to multiple whole genome alignment

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
A practical algorithm for finding maximal exact matches in large sequence datasets using sparse suffix arrays

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given two long strings S and T, representing two genomic sequences, and given a user defined threshold ߝ, the problem of computing maximal exact matches (MEMs) is to find each triple (p1,p2,l) specifying two matching substrings S[p1..p1+l−1]=T[p2..p2+l−1], such that l≥ߝ and S[p1−1]≠T[p2−1] and S[p1+l]≠T[p2+l]. Computing MEMs is a major problem in bioinformitcs, because it is a primary step in identifying regions of common similarity among genomic sequences. Faster solutions to this problem are still demanded to overcome the ever increasing amount of genomic sequences to be compared to each other. In this paper, we present a parallel version of the MEM algorithm running on a computer cluster. Our experimental results show that our algorithm is efficient and scalable.