Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementing remote procedure calls
ACM Transactions on Computer Systems (TOCS)
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Introduction to the wire-speed processor and architecture
IBM Journal of Research and Development
Communications of the ACM
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Pegasus: coordinated scheduling for virtualized accelerator-based systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Dynamically Specialized Datapaths for energy efficient computing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Implementation of H.264 on Mobile Device
IEEE Transactions on Consumer Electronics
Disengaged scheduling for fair, protected access to fast computational accelerators
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
The inexorable demand for computing power has lead to increasing interest in accelerator-based designs. An accelerator is specialized hardware unit that can perform a set of tasks with much higher performance or power efficiency than a general-purpose CPU. They may be embedded in the pipeline as a functional unit, as in SIMD instructions, or attached to the system as a separate device, as in a cryptographic co-processor. Current operating systems provide little support for accelerators: whether integrated into a processor or attached as a device, they are treated as CPU or a device and given no additional consideration. However, future processors may have designs that require more management by the operating system. For example, heterogeneous processors may only provision some cores with accelerators, and IBM's wire-speed processor allows user-mode code to launch computations on a shared accelerator without kernel involvement. In such systems, the OS can improve performance by allocating accelerator resources and scheduling access to the accelerator as it does for memory and CPU time. In this paper, we discuss the challenges presented by adopting accelerators as an execution resource managed by the operating system. We also present the initial design of our system, which provides flexible control over where and when code executes and can apply power and performance policies. It presents a simple software interface that can leverage new hardware interfaces as well as sharing of specialized units in a heterogeneous system.