Application level fault tolerance in heterogeneous networks of workstations
Journal of Parallel and Distributed Computing
Towards Convergence in Job Schedulers for Parallel Supercomputers
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Model For Speedup of Parallel Programs
A Model For Speedup of Parallel Programs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Æthereal Network on Chip: Concepts, Architectures, and Implementations
IEEE Design & Test
Malleable applications for scalable high performance computing
Cluster Computing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
ADAM: run-time agent-based distributed application mapping for on-chip communication
Proceedings of the 45th annual Design Automation Conference
Moldable parallel job scheduling using job efficiency: an iterative approach
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Towards Self-Aware Performance and Resource Management in Modern Service-Oriented Systems
SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
DistRM: distributed resource management for on-chip many-core systems
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A divide and conquer based distributed run-time mapping methodology for many-core platforms
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Todays prevalent solutions for modern embedded systems and general computing employ many processing units connected by an on-chip network leaving behind complex superscalar architectures In this paper, we couple the concept of distributed computing with parallel applications and present a workload-aware distributed run-time framework for malleable applications on many-core platforms. The presented framework is responsible for serving in a distributed way and at run-time, the needs of malleable applications, maximizing resource utilization avoiding dominating effects and taking into account the type of processors supporting platform heterogeneity, while having a small overhead in overall inter-core communication. Our framework has been implemented as part of a C simulator and additionally as a run-time service on the Single-Chip Cloud Computer (SCC), an experimental processor created by Intel Labs, and we compared it against a state-of-art run-time resource manager. Experimental results showed that our framework has on average 70% less messages, 64% smaller message size and 20% application speed-up gain.