MOMA: mapping of memory-intensive software-pipelined applications for systems with multiple memory controllers

Authors:
Janmartin Jahn;Santiago Pagani;Jian-Jia Chen;Jörg Henkel
Affiliations:
Karlsruhe Institute for Technology (KIT), Germany;Karlsruhe Institute for Technology (KIT), Germany;Karlsruhe Institute for Technology (KIT), Germany;Karlsruhe Institute for Technology (KIT), Germany
Venue:
Proceedings of the International Conference on Computer-Aided Design
Year:
2013

Citing 21
Cited 0

Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
StreamIt: A Language for Streaming Applications

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Dynamic Load Balancing and Efficient Load Estimators for Asynchronous Iterative Algorithms

IEEE Transactions on Parallel and Distributed Systems
Centralized Run-Time Resource Management in a Network-on-Chip Containing Reconfigurable Hardware Tiles

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs

RSP '07 Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Thread scheduling for multi-core platforms

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
ADAM: run-time agent-based distributed application mapping for on-chip communication

Proceedings of the 45th annual Design Automation Conference
Achieving predictable performance through better memory controller placement in many-core CMPs

Proceedings of the 36th annual international symposium on Computer architecture
Partitioned Fixed-Priority Preemptive Scheduling for Multi-core Processors

ECRTS '09 Proceedings of the 2009 21st Euromicro Conference on Real-Time Systems
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A task remapping technique for reliable multi-core embedded systems

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
DistRM: distributed resource management for on-chip many-core systems

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Dynamic scheduling of stream programs on embedded multi-core processors

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Pipelets: self-organizing software pipelines for many-core architectures

Proceedings of the Conference on Design, Automation and Test in Europe
Optimizations for configuring and mapping software pipelines in many core systems

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many-core systems, the efficient deployment of computational and other resources is key in order to achieve a high throughput. Current state-of-the-art task mapping schemes balance the computational load among cores while avoiding congestions within the communication links. The problem is that a large number of cores running many memory-intensive tasks may congest memory controllers because their number and bandwidth is constrained. To avoid a high throughput degradation that could result from congested memory controllers, the mapping of tasks must be sensitized to the limited bandwidth of off-chip memory. Designing efficient and effective algorithms to optimize the throughput by jointly considering the load of memory controllers, computation, and communication is very challenging. In this paper, we address this problem by distributing cores among applications and then heuristically map tasks such that the load of the memory controllers is sufficiently balanced. Our heuristic also minimizes the effect of decreased throughput resulting from mapping communicating tasks to cores that belong to different controllers. Our experiments encourage us in that we can reduce the saturation of memory controllers and significantly increase the system throughput compared to employing several state-of-the-art task mapping schemes.