Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Advances in dataflow programming languages
ACM Computing Surveys (CSUR)
The OpenMP Source Code Repository
PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
Queue - Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
SBAC-PAD '08 Proceedings of the 2008 20th International Symposium on Computer Architecture and High Performance Computing
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Hi-index | 0.00 |
Parallel programming has become mandatory to fully exploit the potential of multi-core CPUs. The dataflow model provides a natural way to exploit parallelism. However, specifying dependences and control using fine-grained instructions in dataflow programs can be complex and present unwanted overheads. To address this issue, we have designed TALM: a coarse-grained dataflow execution model to be used on top of widespread architectures. We implemented TALM as the Trebuchet virtual machine for multi-cores. The programmer identifies code blocks that can run in parallel and connects them to form a dataflow graph, which allows one to have the benefits of parallel dataflow execution in a Von Neumann machine, with small programming effort. We parallelised a set of seven applications using our approach and compared with OpenMP implementations. Results show that Trebuchet can be competitive with state-of-the-art technology, while providing the benefits of dataflow execution.