Vectorization of tree traversals
Journal of Computational Physics
Journal of Computational Physics
Journal of Parallel and Distributed Computing
Global illumination using photon maps
Proceedings of the eurographics workshop on Rendering techniques '96
Parametric shape analysis via 3-valued logic
Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Multidimensional binary search trees used for associative searching
Communications of the ACM
Outer-loop vectorization: revisited for short SIMD architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Yet Faster Ray-Triangle Intersection (Using SSE4)
IEEE Transactions on Visualization and Computer Graphics
Cache-oblivious ray reordering
ACM Transactions on Graphics (TOG)
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
An Evaluation of Vectorizing Compilers
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Efficient SIMD code generation for irregular kernels
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal
Proceedings of the 15th International Conference on Extending Database Technology
Shallow bounding volume hierarchies for fast SIMD ray tracing of incoherent rays
EGSR'08 Proceedings of the Nineteenth Eurographics conference on Rendering
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
From relational verification to SIMD loop synthesis
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
SIMD parallelization of applications that traverse irregular data structures
CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
General transformations for GPU execution of tree traversals
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Repeated tree traversals are ubiquitous in many domains such as scientific simulation, data mining and graphics. Modern commodity processors support SIMD instructions, and using these instructions to process multiple traversals at once has the potential to provide substantial performance improvements. Unfortunately these algorithms often feature highly diverging traversals which inhibit efficient SIMD utilization, to the point that other, less profitable sources of vectorization must be exploited instead. Previous work has proposed traversal splicing, a locality transformation for tree traversals, which dynamically reorders traversals based on previous behavior, based on the insight that traversals which have behaved similarly so far are likely to behave similarly in the future. In this work, we cast this dynamic reordering as a scheduling for efficient SIMD execution, and show that it can dramatically improve the SIMD utilization of diverging traversals, close to ideal utilization. For five irregular tree traversal algorithms, our techniques are able to deliver speedups of 2.78 on average over baseline implementations. Furthermore our techniques can effectively SIMDize algorithms that prior, manual vectorization attempts could not.