On supernode transformations and multithreading for the longest common subsequence problem

  • Authors:
  • Johann Steinbrecher;Weijia Shang

  • Affiliations:
  • Santa Clara University, Santa Clara, CA;Santa Clara University, Santa Clara, CA

  • Venue:
  • AusPDC '12 Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The longest common subsequence (LCS) problem is an important algorithm in computer science with many applications such as DNA matching (bio-engineering) and file comparison (UNIX diff). While there has been a lot of research for finding an efficient solution to this problem, the research emphasis has shifted with the advent of multi-core architectures towards multithreaded implementations. This paper applies supernode transformations to partition the dynamic programming solution of the LCS problem into multiple threads. Then, we enhance this method by presenting a transformation matrix that skews the loop nest such that loop carried dependencies of the inner loop are eliminated in each supernode. We find that this technique performs well on microarchitectures supporting out-of-order execution while in-order execution machines do not benefit from it. Furthermore, we present a variation of the supernode transformations and multithreading strategy which groups entire rows of the index set to form a supernode. The inter thread synchronization is performed by an array of mutexes. We find that this scheme reduces the amount of thread management overhead and improves the data locality. A formula for the total execution time of each method is presented. The techniques are benchmarked on a 12 core and a four core machine. At the 12 core machine the traditional supernode transformation speeds up the original loop nest 16.7 times. We enhance this technique to score a 42.6 speedup and apply our new method scoring a 59.5 speedup. We experience the phenomenon of super-linear speedup as the the performance gain is larger than the number of processing cores. Concepts presented and discussed in this paper on the LCS problem are generally applicable to regular dependence algorithms.