Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Dynamic Load Balancing and Efficient Load Estimators for Asynchronous Iterative Algorithms
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
Heuristics for Dynamic Task Mapping in NoC-based Heterogeneous MPSoCs
RSP '07 Proceedings of the 18th IEEE/IFIP International Workshop on Rapid System Prototyping
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Incremental run-time application mapping for homogeneous NoCs with multiple voltage levels
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Thread scheduling for multi-core platforms
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
ADAM: run-time agent-based distributed application mapping for on-chip communication
Proceedings of the 45th annual Design Automation Conference
Achieving predictable performance through better memory controller placement in many-core CMPs
Proceedings of the 36th annual international symposium on Computer architecture
Partitioned Fixed-Priority Preemptive Scheduling for Multi-core Processors
ECRTS '09 Proceedings of the 2009 21st Euromicro Conference on Real-Time Systems
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A task remapping technique for reliable multi-core embedded systems
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
DistRM: distributed resource management for on-chip many-core systems
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Dynamic scheduling of stream programs on embedded multi-core processors
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Pipelets: self-organizing software pipelines for many-core architectures
Proceedings of the Conference on Design, Automation and Test in Europe
Optimizations for configuring and mapping software pipelines in many core systems
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
In many-core systems, the efficient deployment of computational and other resources is key in order to achieve a high throughput. Current state-of-the-art task mapping schemes balance the computational load among cores while avoiding congestions within the communication links. The problem is that a large number of cores running many memory-intensive tasks may congest memory controllers because their number and bandwidth is constrained. To avoid a high throughput degradation that could result from congested memory controllers, the mapping of tasks must be sensitized to the limited bandwidth of off-chip memory. Designing efficient and effective algorithms to optimize the throughput by jointly considering the load of memory controllers, computation, and communication is very challenging. In this paper, we address this problem by distributing cores among applications and then heuristically map tasks such that the load of the memory controllers is sufficiently balanced. Our heuristic also minimizes the effect of decreased throughput resulting from mapping communicating tasks to cores that belong to different controllers. Our experiments encourage us in that we can reduce the saturation of memory controllers and significantly increase the system throughput compared to employing several state-of-the-art task mapping schemes.