An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem

Authors:
Dmitry Korkin;Qingguo Wang;Yi Shang
Affiliations:
-;-;-
Venue:
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Year:
2008

Citing 0
Cited 5

Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A hyper-heuristic for the Longest Common Subsequence problem

Computational Biology and Chemistry
Quick-MLCS: a new algorithm for the multiple longest common subsequence problem

Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
On supernode transformations and multithreading for the longest common subsequence problem

AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
A Case Study of Implementing Supernode Transformations

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the multiple longest common subsequence (MLCS) is an important problemin the areas of bioinformatics and computational genomics. Approaches that are more efficient than the standard dynamic programming method have been introduced and successfully parallelized for the special cases of 2 sequences. However, the increasing complexity and size of biological data require an efficient method applicable to an arbitrary number of sequences as well as its efficient parallelization. A recently developed dominant points method for a general MLCS problem has been shown a significant performance improvement over the dynamic programming method, when number of sequences is larger than two. At the same time, the approach has revealed strong demand for its parallelization, in order to be applied to the larger families of sequences or sequences of the greater lengths. In this paper, we introduce an efficient parallel algorithm to find a MLCS for an arbitrary number of sequences, which is based on the dominant points method. When the number of processors is not greater than the size of alphabet multiplied by the number of sequences, the parallel algorithm is estimated to have the asymptotically linear speed up. We experimentally tested the algorithm using sets of randomly generated sequences over different alphabets as well as the protein sequences from a family of homologous proteins. We found that the performance of the algorithm increases with the number of input sequences and reaches a near-linear speedup for eight sequences.