Disengaged scheduling for fair, protected access to fast computational accelerators

Authors:
Konstantinos Menychtas;Kai Shen;Michael L. Scott
Affiliations:
University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA
Venue:
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Year:
2014

Citing 19
Cited 0

Analysis and simulation of a fair queueing algorithm

SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
How fair is fair queuing

Journal of the ACM (JACM)
A generalized processor sharing approach to flow control in integrated services networks: the single-node case

IEEE/ACM Transactions on Networking (TON)
Efficient fair queueing using deficit round robin

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Soft timers: efficient microsecond software timer support for network processing

ACM Transactions on Computer Systems (TOCS)
Interposed proportional sharing for a storage service utility

Proceedings of the joint international conference on Measurement and modeling of computer systems
Helios: heterogeneous multiprocessing with satellite kernels

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
FlexSC: flexible system call scheduling with exception-less system calls

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
A taxonomy of accelerator architectures and their programming models

IBM Journal of Research and Development
TimeGraph: GPU scheduling for real-time multi-tasking environments

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Pegasus: coordinated scheduling for virtualized accelerator-based systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
PTask: operating system abstractions to manage GPUs as compute devices

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Globally scheduled real-time multiprocessor systems with GPUs

Real-Time Systems
Operating systems should manage accelerators

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Gdev: first-class GPU resource management in the operating system

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Supporting Preemptive Task Executions and Memory Copies in GPGPUs

ECRTS '12 Proceedings of the 2012 24th Euromicro Conference on Real-Time Systems
Hardware acceleration in the IBM PowerEN processor: architecture and performance

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
FlashFQ: a fair queueing I/O scheduler for flash-based SSDs

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Enabling OS research by inferring interactions in the black-box GPU stack

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's operating systems treat GPUs and other computational accelerators as if they were simple devices, with bounded and predictable response times. With accelerators assuming an increasing share of the workload on modern machines, this strategy is already problematic, and likely to become untenable soon. If the operating system is to enforce fair sharing of the machine, it must assume responsibility for accelerator scheduling and resource management. Fair, safe scheduling is a particular challenge on fast accelerators, which allow applications to avoid kernel-crossing overhead by interacting directly with the device. We propose a disengaged scheduling strategy in which the kernel intercedes between applications and the accelerator on an infrequent basis, to monitor their use of accelerator cycles and to determine which applications should be granted access over the next time interval. Our strategy assumes a well defined, narrow interface exported by the accelerator. We build upon such an interface, systematically inferred for the latest Nvidia GPUs. We construct several example schedulers, including Disengaged Timeslice with overuse control that guarantees fairness and Disengaged Fair Queueing that is effective in limiting resource idleness, but probabilistic. Both schedulers ensure fair sharing of the GPU, even among uncooperative or adversarial applications; Disengaged Fair Queueing incurs a 4% overhead on average (max 18%) compared to direct device access across our evaluation scenarios.