CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
First version of a data flow procedure language
Programming Symposium, Proceedings Colloque sur la Programmation
Earth: an efficient architecture for running threads
Earth: an efficient architecture for running threads
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Computer
Intel® threading building blocks
Journal of Computing Sciences in Colleges
The Cilk++ concurrency platform
Proceedings of the 46th Annual Design Automation Conference
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Ordered and unordered algorithms for parallel breadth first search
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Scientific Programming - Exploring Languages for Expressing Medium to Massive On-Chip Parallelism
Using a "codelet" program execution model for exascale machines: position paper
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Data-Driven Tasks and Their Implementation
ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Habanero-Java: the new adventures of old X10
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Towards a codelet-based runtime for exascale computing: position paper
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Proceedings of the 9th conference on Computing Frontiers
For extreme parallelism, your OS is Sooooo last-millennium
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Hi-index | 0.00 |
Chip architectures are shifting from few, faster, functionally heavy cores to abundant, slower, simpler cores to address pressing physical limitations such as energy consumption and heat expenditure. As architectural trends continue to fluctuate, we propose a novel program execution model, the Codelet model, which is designed for new systems tasked with efficiently managing varying resources. The Codelet model is a fine-grained dataflow inspired model extended to address the cumbersome resources available in new architectures. In the following, we define the Codelet execution model as well as provide an implementation named DARTS. Utilizing DARTS and two predominant kernels, matrix multiplication and the Graph 500's breadth first search, we explore the validity of fine-grain execution as a promising and viable execution model for future and current architectures. We show that our runtime is on par or performs better than AMD's highly-optimized parallel library for matrix multication, outperforming it on average by 1.40× with a speedup up to 4×. Our implementation of the parallel BFS outperforms Graph 500's reference implementation (with or without dynamic scheduling) on average by 1.50× with a speed up of up to 2.38×.