Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Evaluating synchronization on shared address space multiprocessors: methodology and performance
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An algorithm for mapping loops onto coarse-grained reconfigurable architectures
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Fast synchronization for chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Dynamic partitioning of processing and memory resources in embedded MPSoC architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Distributed and low-power synchronization architecture for embedded multiprocessors
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Many MPSoC applications are loop-intensive and amenable to automatic parallelization with suitable compiler support. One of the key components of any compiler-parallelized code is barrier instructions which are used to perform global synchronization across parallel processors. This scenario calls for a lightweight synchronization infrastructure. In this work we describe a lightweight barrier support library for a non-cache-coherent MPSoC architecture. The library is coupled with a parallelizing compiler front-end to set up a complete automated flow which, starting from a sequential code, produces the parallelized binary code that can be directly executed onto an MPSoC target (a multi-core non-cache-coherent ARM7 platform). This tool-flow has been characterized in terms of system performance and energy.