Processing data streams with hard real-time constraints on heterogeneous systems

Authors:
Uri Verner;Assaf Schuster;Mark Silberstein
Affiliations:
Technion, Haifa, Israel;Computer Science, Haifa, Israel;Technion, Haifa, Israel
Venue:
Proceedings of the international conference on Supercomputing
Year:
2011

Citing 19
Cited 4

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Scheduling Periodic Hard Real-Time Tasks with Arbitrary Deadlines on Multiprocessors

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
The Non-preemptive Scheduling of Periodic Tasks upon Multiprocessors

Real-Time Systems
Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems

CSE '08 Proceedings of the 2008 11th IEEE International Conference on Computational Science and Engineering
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units

CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures

DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
On sorting and load balancing on GPUs

ACM SIGARCH Computer Architecture News
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
An adaptative game loop architecture with automatic distribution of tasks between CPU and GPU

Computers in Entertainment (CIE) - SPECIAL ISSUE: Games
Identifying Performance Bottlenecks in Work-Stealing Computations

Computer
Modeling GPU-CPU workloads and systems

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Parallel processing of matrix multiplication in a CPU and GPU heterogeneous environment

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Evaluation of streaming aggregation on parallel hardware architectures

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
EDF-schedulability of synchronous periodic task systems is coNP-hard

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Static-priority periodic scheduling on muitiprocessors

RTSS'10 Proceedings of the 21st IEEE conference on Real-time systems symposium
Task management for irregular-parallel workloads on the GPU

Proceedings of the Conference on High Performance Graphics
Real-time task scheduling on heterogeneous two-processor systems

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Detecting application load imbalance on high end massively parallel systems

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Proceedings of the 5th Annual International Systems and Storage Conference
A multi-GPU programming library for real-time applications

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency -- to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput -- to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs hold high potential to speed up the computations. However, their use for hard real-time stream processing is complicated by slow communications with CPUs, variable throughput changing non-linearly with the input size, and weak consistency of their local memory with respect to CPU accesses. Furthermore, their coarse grain hardware scheduler renders them unsuitable for unbalanced multi-stream workloads. We present a general, efficient and practical algorithm for hard real-time stream scheduling in heterogeneous systems. The algorithm assigns incoming streams of different rates and deadlines to CPUs and accelerators. By employing novel stream schedulability criteria for accelerators, the algorithm finds the assignment which simultaneously satisfies the aggregate throughput requirements of all the streams and the deadline constraint of each stream alone. Using the AES-CBC encryption kernel, we experimented extensively on thousands of streams with realistic rate and deadline distributions. Our framework outperformed the alternative methods by allowing 50% more streams to be processed with provably deadline-compliant execution even for deadlines as short as tens milliseconds. Overall, the combined GPU-CPU execution allows for up to 4-fold throughput increase over highly-optimized multi-threaded CPU-only implementations.