Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM
Efficiently Mining Maximal Frequent Itemsets
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
KD-tree acceleration structures for a GPU raytracer
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Interactive k-d tree GPU raytracing
Proceedings of the 2007 symposium on Interactive 3D graphics and games
Automated social hierarchy detection through email network analysis
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Real-time KD-tree construction on graphics hardware
ACM SIGGRAPH Asia 2008 papers
A scalable, parallel algorithm for maximal clique enumeration
Journal of Parallel and Distributed Computing
Accelerating large graph algorithms on the GPU using CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
On the limits of GPU acceleration
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Maximal clique enumeration in finding near neighbourhoods
Transactions on Rough Sets XVI
Hi-index | 0.00 |
We explore the backtracking paradigm with properties seen as sub-optimal for GPU architectures, using as a case study the maximal clique enumeration problem, and find that the presence of these properties limit GPU performance to approximately 1.4-2.25 times a single CPU core. The GPU performance "lessons" we find critical to providing this performance include a coarse-and-fine-grain parallelization of the search space, a low-overhead load-balanced distribution of work, global memory latency hiding through coalescence, saturation, and shared memory utilization, and the use of GPU output buffering as a solution to irregular workloads and a large solution domain. We also find a strong reliance on an efficient global problem structure representation that bounds any efficiencies gained from these lessons, and discuss the meanings of these results to backtracking problems in general.