Lessons learned from exploring the backtracking paradigm on the GPU

  • Authors:
  • John Jenkins;Isha Arkatkar;John D. Owens;Alok Choudhary;Nagiza F. Samatova

  • Affiliations:
  • North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN;University of California, Davis, Davis, CA;Northwestern University, Evanston, IL;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN

  • Venue:
  • Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the backtracking paradigm with properties seen as sub-optimal for GPU architectures, using as a case study the maximal clique enumeration problem, and find that the presence of these properties limit GPU performance to approximately 1.4-2.25 times a single CPU core. The GPU performance "lessons" we find critical to providing this performance include a coarse-and-fine-grain parallelization of the search space, a low-overhead load-balanced distribution of work, global memory latency hiding through coalescence, saturation, and shared memory utilization, and the use of GPU output buffering as a solution to irregular workloads and a large solution domain. We also find a strong reliance on an efficient global problem structure representation that bounds any efficiencies gained from these lessons, and discuss the meanings of these results to backtracking problems in general.