PTask: operating system abstractions to manage GPUs as compute devices

Authors:
Christopher J. Rossbach;Jon Currey;Mark Silberstein;Baishakhi Ray;Emmett Witchel
Affiliations:
Microsoft Research;Microsoft Research;Technion;University of Texas at Austin;University of Texas at Austin
Venue:
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Year:
2011

Citing 49
Cited 22

Static scheduling of synchronous data flow programs for digital signal processing

IEEE Transactions on Computers
Threads and input/output in the synthesis kernal

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The CODE 2.0 graphical parallel programming language

ICS '92 Proceedings of the 6th international conference on Supercomputing
The ESTEREL synchronous programming language: design, semantics, implementation

Science of Computer Programming
Fbufs: a high-bandwidth cross-domain transfer facility

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Container shipping: operating system support for I/O-intensive applications

Computer
Extensibility safety and performance in the SPIN operating system

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Making paths explicit in the Scout operating system

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A case for intelligent disks (IDISKs)

ACM SIGMOD Record
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The click modular router

ACM Transactions on Computer Systems (TOCS)
Programming Microsoft Directshow

Programming Microsoft Directshow
P-RIO: A Modular Parallel-Programming Environment

IEEE Concurrency
Active Disks for Large-Scale Data Processing

Computer
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Stream Computations Organized for Reconfigurable Execution (SCORE)

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Static array storage optimization in MATLAB

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Bilateral Filtering for Gray and Color Images

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
An Efficient Zero-Copy I/O Framework for UNIX

An Efficient Zero-Copy I/O Framework for UNIX
TCP Offload to the Rescue

Queue - Open Source
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 3 - Volume 03
Processor-embedded distributed smart disks for I/O-intensive workloads: architectures, performance models and evaluation

Journal of Parallel and Distributed Computing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Programming using RapidMind on the Cell BE

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Fast computation of database operations using graphics processors

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Tapping into the fountain of CPUs: on operating system support for programmable devices

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accelerating computing with the cell broadband engine processor

Proceedings of the 5th conference on Computing frontiers
Relational joins on graphics processors

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Parallel Computing Experiences with CUDA

IEEE Micro
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
CUDA-Lite: Reducing GPU Programming Complexity

Languages and Compilers for Parallel Computing
Predictive Runtime Code Scheduling for Heterogeneous Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
hiCUDA: a high-level directive-based language for GPU programming

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
BiDi screen: a thin, depth-sensing LCD for 3D interaction using light fields

ACM SIGGRAPH Asia 2009 papers
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Helios: heterogeneous multiprocessing with satellite kernels

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Scientific and Engineering Computing Using ATI Stream Technology

IEEE Design & Test
An asymmetric distributed shared memory model for heterogeneous parallel systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
PacketShader: a GPU-accelerated software router

Proceedings of the ACM SIGCOMM 2010 conference
Lime: a Java-compatible and synthesizable language for heterogeneous architectures

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Sponge: portable stream programming on graphics engines

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
SSLShader: cheap SSL acceleration with commodity processors

Proceedings of the 8th USENIX conference on Networked systems design and implementation
TimeGraph: GPU scheduling for real-time multi-tasking environments

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference

Shredder: GPU-accelerated incremental storage and computation

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
A scalable framework for heterogeneous GPU-based clusters

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Optimizing latency and throughput for spawning processes on massively multicore processors

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Operating systems should manage accelerators

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Gdev: first-class GPU resource management in the operating system

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Softshell: dynamic scheduling on GPUs

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
GPUstore: harnessing GPU computing for storage systems in the OS kernel

Proceedings of the 5th Annual International Systems and Storage Conference
GPUfs: integrating a file system with GPUs

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
We need to talk about NICs

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Towards adaptive GPU resource management for embedded real-time systems

ACM SIGBED Review
Optimizing process creation and execution on multi-core architectures

International Journal of High Performance Computing Applications
Zero-copy I/O processing for low-latency GPU computing

Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
RSVM: a region-based software virtual memory for GPU

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Modeling NICs with Unicorn

Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Enabling OS research by inferring interactions in the black-box GPU stack

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Fast and flexible: parallel packet processing with GPUs and click

ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Heterogeneous system coherence for integrated CPU-GPU systems

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Disengaged scheduling for fair, protected access to fast computational accelerators

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guarantees like fairness and performance isolation, and can streamline data movement in ways that are impossible under current GPU programming models. Our experience developing the PTask API, along with a gestural interface on Windows 7 and a FUSE-based encrypted file system on Linux show that the PTask API can provide important system-wide guarantees where there were previously none, and can enable significant performance improvements, for example gaining a 5× improvement in maximum throughput for the gestural interface.