Communications of the ACM - Special issue on parallelism
Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Fast scan algorithms on graphics processors
Proceedings of the 22nd annual international conference on Supercomputing
Communications of the ACM
Real-time KD-tree construction on graphics hardware
ACM SIGGRAPH Asia 2008 papers
Whitted ray-tracing for dynamic scenes using a ray-space hierarchy on the GPU
EGSR'07 Proceedings of the 18th Eurographics conference on Rendering Techniques
Technical Section: Parallel generation of multiple L-systems
Computers and Graphics
Fast parallel surface and solid voxelization on GPUs
ACM SIGGRAPH Asia 2010 papers
Collision-streams: fast GPU-based collision detection for deformable models
I3D '11 Symposium on Interactive 3D Graphics and Games
International Journal of High Performance Computing Applications
Proceedings of the VLDB Endowment
Expressive array constructs in an embedded GPU kernel programming language
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
International Journal of High Performance Computing Applications
Clustered deferred and forward shading
EGGH-HPG'12 Proceedings of the Fourth ACM SIGGRAPH / Eurographics conference on High-Performance Graphics
Ray tracing dynamic scenes with shadows on GPU
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Data-Parallel Decompression of Triangle Mesh Topology
Computer Graphics Forum
StreamScan: fast scan algorithms for GPUs without global barrier synchronization
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating simulation of agent-based models on heterogeneous architectures
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
High resolution sparse voxel DAGs
ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
Accelerating wildfire susceptibility mapping through GPGPU
Journal of Parallel and Distributed Computing
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Hi-index | 0.00 |
Stream compaction is a common parallel primitive used to remove unwanted elements in sparse data. This allows highly parallel algorithms to maintain performance over several processing steps and reduces overall memory usage. For wide SIMD many-core architectures, we present a novel stream compaction algorithm and explore several variations thereof. Our algorithm is designed to maximize concurrent execution, with minimal use of synchronization. Bandwidth and auxiliary storage requirements are reduced significantly, which allows for substantially better performance. We have tested our algorithms using CUDA on a PC with an NVIDIA GeForce GTX280 GPU. On this hardware, our reference implementation provides a 3x speedup over previous published algorithms.