CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Pipeline behavior prediction for superscalar processors by abstract interpretation
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Timing Anomalies in Dynamically Scheduled Microprocessors
RTSS '99 Proceedings of the 20th IEEE Real-Time Systems Symposium
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
Automatic performance model construction for the fast software exploration of new hardware designs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
CAPSULE: Hardware-Assisted Parallel Execution of Component-Based Programs
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison
Proceedings of the conference on Design, automation and test in Europe
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Hardware/software support for adaptive work-stealing in on-chip multiprocessor
Journal of Systems Architecture: the EUROMICRO Journal
Scalable hardware support for conditional parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Resource-aware programming and simulation of MPSoC architectures through extension of X10
Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems
Support for OpenMP tasks on cell architecture
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Hi-index | 0.00 |
Increasingly complex consumer electronics applications call for embedded processors with higher performance. Multi-cores are capable of delivering the required performance. However, many of these embedded applications must meet some form of soft real-time constraints, and program behavior on multi-cores is even harder to predict than on single-cores. In this article, we highlight the greater performance variability of irregular applications (non-regular control flow and/or data structures) across data sets when parallelized and run on a multi-core. We then show that a proper parallelization approach coupled with a lightweight run-time system can drastically reduce this performance variability without sacrificing their performance. This approach requires no complex program or architecture analysis or modeling. Moreover, we show that parallel program performance becomes stable enough that it is possible to reasonably and accurately predict it by sampling a few training runs.