Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Threads and input/output in the synthesis kernal
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The CODE 2.0 graphical parallel programming language
ICS '92 Proceedings of the 6th international conference on Supercomputing
The ESTEREL synchronous programming language: design, semantics, implementation
Science of Computer Programming
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Making paths explicit in the Scout operating system
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
A case for intelligent disks (IDISKs)
ACM SIGMOD Record
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Transactions on Computer Systems (TOCS)
Programming Microsoft Directshow
Programming Microsoft Directshow
P-RIO: A Modular Parallel-Programming Environment
IEEE Concurrency
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Stream Computations Organized for Reconfigurable Execution (SCORE)
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Static array storage optimization in MATLAB
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Bilateral Filtering for Gray and Color Images
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
An Efficient Zero-Copy I/O Framework for UNIX
An Efficient Zero-Copy I/O Framework for UNIX
Queue - Open Source
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
A Time-Of-Flight Depth Sensor - System Description, Issues and Solutions
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 3 - Volume 03
Journal of Parallel and Distributed Computing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Programming using RapidMind on the Cell BE
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Fast computation of database operations using graphics processors
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Tapping into the fountain of CPUs: on operating system support for programmable devices
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Parallel Computing Experiences with CUDA
IEEE Micro
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
CUDA-Lite: Reducing GPU Programming Complexity
Languages and Compilers for Parallel Computing
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
hiCUDA: a high-level directive-based language for GPU programming
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
BiDi screen: a thin, depth-sensing LCD for 3D interaction using light fields
ACM SIGGRAPH Asia 2009 papers
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Helios: heterogeneous multiprocessing with satellite kernels
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Scientific and Engineering Computing Using ATI Stream Technology
IEEE Design & Test
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
Lime: a Java-compatible and synthesizable language for heterogeneous architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
SSLShader: cheap SSL acceleration with commodity processors
Proceedings of the 8th USENIX conference on Networked systems design and implementation
TimeGraph: GPU scheduling for real-time multi-tasking environments
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
A scalable framework for heterogeneous GPU-based clusters
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Optimizing latency and throughput for spawning processes on massively multicore processors
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
Operating systems should manage accelerators
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Gdev: first-class GPU resource management in the operating system
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Softshell: dynamic scheduling on GPUs
ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
GPUstore: harnessing GPU computing for storage systems in the OS kernel
Proceedings of the 5th Annual International Systems and Storage Conference
GPUfs: integrating a file system with GPUs
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Towards adaptive GPU resource management for embedded real-time systems
ACM SIGBED Review
Optimizing process creation and execution on multi-core architectures
International Journal of High Performance Computing Applications
Zero-copy I/O processing for low-latency GPU computing
Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
Enabling OS research by inferring interactions in the black-box GPU stack
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Fast and flexible: parallel packet processing with GPUs and click
ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
Heterogeneous system coherence for integrated CPU-GPU systems
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Disengaged scheduling for fair, protected access to fast computational accelerators
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GPUfs: Integrating a file system with GPUs
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These new abstractions, collectively called the PTask API, support a dataflow programming model. Because a PTask graph consists of OS-managed objects, the kernel has sufficient visibility and control to provide system-wide guarantees like fairness and performance isolation, and can streamline data movement in ways that are impossible under current GPU programming models. Our experience developing the PTask API, along with a gestural interface on Windows 7 and a FUSE-based encrypted file system on Linux show that the PTask API can provide important system-wide guarantees where there were previously none, and can enable significant performance improvements, for example gaining a 5× improvement in maximum throughput for the gestural interface.