Efficient Parallel Execution of Irregular Recursive Programs
IEEE Transactions on Parallel and Distributed Systems
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Real-time KD-tree construction on graphics hardware
ACM SIGGRAPH Asia 2008 papers
Nested data-parallelism on the gpu
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Efficient scheduling of recursive control flow on GPUs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Recent research has shown promising results on using graphics processing units (GPUs) to accelerate general-purpose computation. However, today's GPUs do not support recursive functions. As a result, for inherently recursive algorithms such as tree traversal, GPU programmers need to explicitly use stacks to emulate the recursion. Parallelizing such stack-based implementation on the GPU increases the programming difficulty; moreover, it is unclear how to improve the efficiency of such parallel implementations. As a first step to address both ease of programming and efficiency issues, we propose three parallel stack implementation alternatives that differ in the granularity of stack sharing. Taking tree traversals as an example, we study the performance tradeoffs between these alternatives and analyze their behaviors in various situations. Our results could be useful to both GPU programmers and GPU compiler writers.