DANBI: dynamic scheduling of irregular stream programs for many-core systems

Authors:
Changwoo Min;Young Ik Eom
Affiliations:
Sungkyunkwan University and Samsung Electronics, Suwon, South Korea;Sungkyunkwan University, Suwon, South Korea
Venue:
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Year:
2013

Citing 28
Cited 0

Optimal parallel merging and sorting without memory conflicts

IEEE Transactions on Computers
A bridging model for parallel computation

Communications of the ACM
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Adaptive Control of Extreme-scale Stream Processing Systems

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
GRAMPS: A programming model for graphics pipelines

ACM Transactions on Graphics (TOG)
Elastic scaling of data parallel operators in stream processing

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Flexible filters: load balancing through backpressure for stream programs

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
An empirical characterization of stream programs and its implications for language and compiler design

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Task management for irregular-parallel workloads on the GPU

Proceedings of the Conference on High Performance Graphics
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
FlexSC: flexible system call scheduling with exception-less system calls

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Sponge: portable stream programming on graphics engines

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
The tao of parallelism in algorithms

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Compiler techniques for scalable performance of stream programs on multicore architectures

Compiler techniques for scalable performance of stream programs on multicore architectures
Dynamic Fine-Grain Scheduling of Pipeline Parallelism

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Auto-parallelizing stateful distributed streaming applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Generating, optimizing, and scheduling a compiler level representation of stream parallelism

Generating, optimizing, and scheduling a compiler level representation of stream parallelism

Quantified Score

Hi-index	0.00

Visualization

Abstract

The stream programming model has received a lot of interest because it naturally exposes task, data, and pipeline parallelism. However, most prior work has focused on static scheduling of regular stream programs. Therefore, irregular applications cannot be handled in static scheduling, and the load imbalance caused by static scheduling faces scalability limitations in many-core systems. In this paper, we introduce the DANBI programming model which supports irregular stream programs and propose dynamic scheduling techniques. Scheduling irregular stream programs is very challenging and the load imbalance becomes a major hurdle to achieve scalability. Our dynamic load-balancing scheduler exploits producer-consumer relationships already expressed in the stream program to achieve scalability. Moreover, it effectively avoids the thundering-herd problem and dynamically adapts to load imbalance in a probabilistic manner. It surpasses prior static stream scheduling approaches which are vulnerable to load imbalance and also surpasses prior dynamic stream scheduling approaches which have many restrictions on supported program types, on the scope of dynamic scheduling, and on preserving data ordering. Our experimental results on a 40-core server show that DANBI achieves an almost linear scalability and outperforms state-of-the-art parallel runtimes by up to 2.8 times.