HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
IEEE Micro
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
FCCM '04 Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Serialization sets: a dynamic dependence-based parallel execution model
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic MIPS rate stabilization in out-of-order processors
Proceedings of the 36th annual international symposium on Computer architecture
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Automatic Parallelization in a Binary Rewriter
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Task Superscalar: An Out-of-Order Task Pipeline
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Flexible and Efficient Instruction-Grained Run-Time Monitoring Using On-Chip Reconfigurable Fabric
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Communications of the ACM
SOMP: Service-Oriented Multi Processors
SCC '11 Proceedings of the 2011 IEEE International Conference on Services Computing
Dynamic Fine-Grain Scheduling of Pipeline Parallelism
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Dataflow execution of sequential imperative programs on multicore architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Platune: a tuning framework for system-on-a-chip platforms
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
FPM: A Flexible Programming Model for MPSoC on FPGA
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Hi-index | 0.00 |
This article presents MP-Tomasulo, a dependency-aware automatic parallel task execution engine for sequential programs. Applying the instruction-level Tomasulo algorithm to MPSoC environments, MP-Tomasulo detects and eliminates Write-After-Write (WAW) and Write-After-Read (WAR) inter-task dependencies in the dataflow execution, therefore to operate out-of-order task execution on heterogeneous units. We implemented the prototype system within a single FPGA. Experimental results on EEMBC applications demonstrate that MP-Tomasulo can execute the tasks out-of-order to achieve as high as 93.6% to 97.6% of ideal peak speedup. A comparative study against a state-of-the-art dataflow execution scheme is illustrated with a classic JPEG application. The promising results show MP-Tomasulo enables programmers to uncover more task-level parallelism on heterogeneous systems, as well as to ease the burden of programmers.