Efficient dominant point algorithms for the multiple longest common subsequence (MLCS) problem
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A hyper-heuristic for the Longest Common Subsequence problem
Computational Biology and Chemistry
Quick-MLCS: a new algorithm for the multiple longest common subsequence problem
Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
On supernode transformations and multithreading for the longest common subsequence problem
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
A Case Study of Implementing Supernode Transformations
International Journal of Parallel Programming
Hi-index | 0.00 |
Finding the multiple longest common subsequence (MLCS) is an important problemin the areas of bioinformatics and computational genomics. Approaches that are more efficient than the standard dynamic programming method have been introduced and successfully parallelized for the special cases of 2 sequences. However, the increasing complexity and size of biological data require an efficient method applicable to an arbitrary number of sequences as well as its efficient parallelization. A recently developed dominant points method for a general MLCS problem has been shown a significant performance improvement over the dynamic programming method, when number of sequences is larger than two. At the same time, the approach has revealed strong demand for its parallelization, in order to be applied to the larger families of sequences or sequences of the greater lengths. In this paper, we introduce an efficient parallel algorithm to find a MLCS for an arbitrary number of sequences, which is based on the dominant points method. When the number of processors is not greater than the size of alphabet multiplied by the number of sequences, the parallel algorithm is estimated to have the asymptotically linear speed up. We experimentally tested the algorithm using sets of randomly generated sequences over different alphabets as well as the protein sequences from a family of homologous proteins. We found that the performance of the algorithm increases with the number of input sequences and reaches a near-linear speedup for eight sequences.