Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Message passing on data-parallel architectures
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Programming Massively Parallel Processors: A Hands-on Approach
Programming Massively Parallel Processors: A Hands-on Approach
StreamScan: fast scan algorithms for GPUs without global barrier synchronization
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e. g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual work distribution. It uses global barriers and task interleaving to provides almost twice the performance of Apple's reference implementation.