Relaxing SIMD control flow constraints using loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Hi-index | 0.00 |
Fine-grain data parallelism is increasingly common in mainstream processors in the form of long vectors and on-chip GPUs. This paper develops compiler and runtime support to exploit such data parallelism for non-numeric, non-graphic, irregular parallel tasks that perform simple computations while traversing many independent, irregular data structures, like trees and graphs. We vectorize the traversal of trees and graphs by treating a set of irregular data structures as a parallel control-flow graph and compiling the traversal into a domain-specific bytecodes. We produce a SIMD interpreter for these bytecodes, so each lane of a SIMD unit traverses one irregular data structure. Despite the overhead of interpretation, we demonstrate significant increases in single-core performance over optimized baselines.