SIGGRAPH '86 Proceedings of the 13th annual conference on Computer graphics and interactive techniques
View-dependent simplification of arbitrary polygonal environments
Proceedings of the 24th annual conference on Computer graphics and interactive techniques
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
Journal of the ACM (JACM)
Proceedings of the 27th annual conference on Computer graphics and interactive techniques
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Real Time Scheduling Theory: A Historical Perspective
Real-Time Systems
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
BSGP: bulk-synchronous GPU programming
ACM SIGGRAPH 2008 papers
Modern Operating Systems
On dynamic load balancing on graphics processors
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Sorting networks and their applications
AFIPS '68 (Spring) Proceedings of the April 30--May 2, 1968, spring joint computer conference
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Understanding the efficiency of ray traversal on GPUs
Proceedings of the Conference on High Performance Graphics 2009
Debugging GPU stream programs through automatic dataflow recording and visualization
ACM SIGGRAPH Asia 2009 papers
RenderAnts: interactive Reyes rendering on GPUs
ACM SIGGRAPH Asia 2009 papers
OptiX: a general purpose ray tracing engine
ACM SIGGRAPH 2010 papers
Task management for irregular-parallel workloads on the GPU
Proceedings of the Conference on High Performance Graphics
Coherent image-based rendering of real-world objects
I3D '11 Symposium on Interactive 3D Graphics and Games
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Data-Parallel Octrees for Surface Reconstruction
IEEE Transactions on Visualization and Computer Graphics
TimeGraph: GPU scheduling for real-time multi-tasking environments
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Dynamic Fine-Grain Scheduling of Pipeline Parallelism
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Hi-index | 0.00 |
In this paper we present Softshell, a novel execution model for devices composed of multiple processing cores operating in a single instruction, multiple data fashion, such as graphics processing units (GPUs). The Softshell model is intuitive and more flexible than the kernel-based adaption of the stream processing model, which is currently the dominant model for general purpose GPU computation. Using the Softshell model, algorithms with a relatively low local degree of parallelism can execute efficiently on massively parallel architectures. Softshell has the following distinct advantages: (1) work can be dynamically issued directly on the device, eliminating the need for synchronization with an external source, i.e., the CPU; (2) its three-tier dynamic scheduler supports arbitrary scheduling strategies, including dynamic priorities and real-time scheduling; and (3) the user can influence, pause, and cancel work already submitted for parallel execution. The Softshell processing model thus brings capabilities to GPU architectures that were previously only known from operating-system designs and reserved for CPU programming. As a proof of our claims, we present a publicly available implementation of the Softshell processing model realized on top of CUDA. The benchmarks of this implementation demonstrate that our processing model is easy to use and also performs substantially better than the state-of-the-art kernel-based processing model for problems that have been difficult to parallelize in the past.