On supernode transformations and multithreading for the longest common subsequence problem

Authors:
Johann Steinbrecher;Weijia Shang
Affiliations:
Santa Clara University, Santa Clara, CA;Santa Clara University, Santa Clara, CA
Venue:
AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
Year:
2012

Citing 13
Cited 0

Time Optimal Linear Schedules for Algorithms with Uniform Dependencies

IEEE Transactions on Computers
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
Introduction to algorithms

Introduction to algorithms
On Supernode Transformation with Minimized Total Running Time

IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
A Survey of Longest Common Subsequence Algorithms

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)

Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Communication-Aware Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Parallelization of the dynamic programming algorithm for solving the longest common subsequence problem

AICCSA '10 Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010

Quantified Score

Hi-index	0.00

Visualization

Abstract

The longest common subsequence (LCS) problem is an important algorithm in computer science with many applications such as DNA matching (bio-engineering) and file comparison (UNIX diff). While there has been a lot of research for finding an efficient solution to this problem, the research emphasis has shifted with the advent of multi-core architectures towards multithreaded implementations. This paper applies supernode transformations to partition the dynamic programming solution of the LCS problem into multiple threads. Then, we enhance this method by presenting a transformation matrix that skews the loop nest such that loop carried dependencies of the inner loop are eliminated in each supernode. We find that this technique performs well on microarchitectures supporting out-of-order execution while in-order execution machines do not benefit from it. Furthermore, we present a variation of the supernode transformations and multithreading strategy which groups entire rows of the index set to form a supernode. The inter thread synchronization is performed by an array of mutexes. We find that this scheme reduces the amount of thread management overhead and improves the data locality. A formula for the total execution time of each method is presented. The techniques are benchmarked on a 12 core and a four core machine. At the 12 core machine the traditional supernode transformation speeds up the original loop nest 16.7 times. We enhance this technique to score a 42.6 speedup and apply our new method scoring a 59.5 speedup. We experience the phenomenon of super-linear speedup as the the performance gain is larger than the number of processing cores. Concepts presented and discussed in this paper on the LCS problem are generally applicable to regular dependence algorithms.