Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Authors:
Uri Verner;Assaf Schuster;Mark Silberstein;Avi Mendelson
Affiliations:
Israel Institute of Technology;Israel Institute of Technology;University of Texas at Austin;Israel Institute of Technology
Venue:
Proceedings of the 5th Annual International Systems and Storage Conference
Year:
2012

Citing 16
Cited 0

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Implications of Classical Scheduling Results for Real-Time Systems

Computer
Scheduling Periodic Hard Real-Time Tasks with Arbitrary Deadlines on Multiprocessors

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
The Non-preemptive Scheduling of Periodic Tasks upon Multiprocessors

Real-Time Systems
Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems

CSE '08 Proceedings of the 2008 11th IEEE International Conference on Computational Science and Engineering
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units

CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
On sorting and load balancing on GPUs

ACM SIGARCH Computer Architecture News
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU

Computers in Entertainment (CIE) - SPECIAL ISSUE: Games
Modeling GPU-CPU workloads and systems

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Static-priority periodic scheduling on muitiprocessors

RTSS'10 Proceedings of the 21st IEEE conference on Real-time systems symposium
Task management for irregular-parallel workloads on the GPU

Proceedings of the Conference on High Performance Graphics
Data-Aware Task Scheduling on Multi-accelerator Based Platforms

ICPADS '10 Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems
Processing data streams with hard real-time constraints on heterogeneous systems

Proceedings of the international conference on Supercomputing
Globally scheduled real-time multiprocessor systems with GPUs

Real-Time Systems
Real-time task scheduling on heterogeneous two-processor systems

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Processing vast numbers of data streams is a common problem in modern computer systems and is known as the "online big data problem." Adding hard real-time constraints to the processing makes the scheduling problem a very challenging task that this paper aims to address. In such an environment, each data stream is manipulated by a (different) application and each datum (data packet) needs to be processed within a known deadline from the time it was generated. This work assumes a central compute engine which consists of a set of CPUs and a set of GPUs. The system receives a configuration of multiple incoming streams and executes a scheduler on the CPU side. The scheduler decides where each data stream will be manipulated (on the CPUs or on one of the GPUs), and the order of execution, in a way that guarantees that no deadlines will be missed. Our scheduler finds such schedules even for workloads that require high utilization of the entire system (CPUs and GPUs). This paper focuses on an environment where all CPUs share a main memory, and are controlled by a single operating system (and a scheduler). The system uses a set of discrete graphic cards, each with its own private main memory. Different memory regions do not share information, and coherency is maintained by the use of explicit memory-copy operations. The paper presents a new algorithm for distributing data and scheduling applications that achieves high utilization of the entire system (CPUs and GPUs), while producing schedules that meet hard real-time constraints. We evaluate our new proposed algorithm by using the AES-CBC encryption kernel on thousands of streams with realistic distribution of rates and deadlines. The paper shows that on a system with a CPU and two GPU cards, our current framework allows up to 87% more data to be processed per time unit than a similar single-GPU system.