Optimizations for configuring and mapping software pipelines in many core systems

Authors:
Janmartin Jahn;Santiago Pagani;Sebastian Kobbe;Jian-Jia Chen;Jörg Henkel
Affiliations:
Karlsruhe Institute of Technology (KIT), Germany;Karlsruhe Institute of Technology (KIT), Germany;Karlsruhe Institute of Technology (KIT), Germany;Karlsruhe Institute of Technology (KIT), Germany;Karlsruhe Institute of Technology (KIT), Germany
Venue:
Proceedings of the 50th Annual Design Automation Conference
Year:
2013

Citing 18
Cited 2

Approximate algorithms scheduling parallelizable tasks

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Theory and Practice in Parallel Job Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Dynamic Load Balancing and Efficient Load Estimators for Asynchronous Iterative Algorithms

IEEE Transactions on Parallel and Distributed Systems
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Thread scheduling for multi-core platforms

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
MAPS: an integrated framework for MPSoC application parallelization

Proceedings of the 45th annual Design Automation Conference
User-aware dynamic task allocation in networks-on-chip

Proceedings of the conference on Design, automation and test in Europe
DistRM: distributed resource management for on-chip many-core systems

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Optimal task assignment in multithreaded processors: a statistical approach

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Scenario-based design flow for mapping streaming applications onto on-chip many-core systems

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Dynamic scheduling of stream programs on embedded multi-core processors

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Runtime resource allocation for software pipelines

Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
MOMA: mapping of memory-intensive software-pipelined applications for systems with multiple memory controllers

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficiently utilizing the computational resources of many core systems is one of the most prominent challenges. The problem worsens when resource requirements vary unpredictably and applications may be started/stopped at any time. To address this challenge, we propose two schemes that calculate and adapt task mappings at runtime: a centralized, optimal mapping scheme and a distributed, hierarchical mapping scheme that trades optimality for a high degree of scalability. Experiments on Intel's 48-core Single-Chip Cloud Computer and in a many core simulator show that a significant improvement in system performance can be achieved over current state-of-the-art.