Lessons learned from exploring the backtracking paradigm on the GPU

Authors:
John Jenkins;Isha Arkatkar;John D. Owens;Alok Choudhary;Nagiza F. Samatova
Affiliations:
North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN;University of California, Davis, Davis, CA;Northwestern University, Evanston, IL;North Carolina State University, Raleigh, NC and Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Year:
2011

Citing 13
Cited 1

Algorithms for constraint-satisfaction problems: a survey

AI Magazine
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Efficiently Mining Maximal Frequent Itemsets

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
KD-tree acceleration structures for a GPU raytracer

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Interactive k-d tree GPU raytracing

Proceedings of the 2007 symposium on Interactive 3D graphics and games
Automated social hierarchy detection through email network analysis

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
From pull-down data to protein interaction networks and complexes with biological relevance

Bioinformatics
Real-time KD-tree construction on graphics hardware

ACM SIGGRAPH Asia 2008 papers
A scalable, parallel algorithm for maximal clique enumeration

Journal of Parallel and Distributed Computing
Accelerating large graph algorithms on the GPU using CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
On the limits of GPU acceleration

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism

Maximal clique enumeration in finding near neighbourhoods

Transactions on Rough Sets XVI

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the backtracking paradigm with properties seen as sub-optimal for GPU architectures, using as a case study the maximal clique enumeration problem, and find that the presence of these properties limit GPU performance to approximately 1.4-2.25 times a single CPU core. The GPU performance "lessons" we find critical to providing this performance include a coarse-and-fine-grain parallelization of the search space, a low-overhead load-balanced distribution of work, global memory latency hiding through coalescence, saturation, and shared memory utilization, and the use of GPU output buffering as a solution to irregular workloads and a large solution domain. We also find a strong reliance on an efficient global problem structure representation that bounds any efficiencies gained from these lessons, and discuss the meanings of these results to backtracking problems in general.