A comparison of message passing and shared memory architectures for data parallel programs
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A Unified Framework for Optimizing Communication in Data-Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Hardware/software partitioning and pipelining
DAC '97 Proceedings of the 34th annual Design Automation Conference
ICS '99 Proceedings of the 13th international conference on Supercomputing
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
Classifying Software-Based Cache Coherence Solutions
IEEE Software
Hardware Support for Interprocess Communication
IEEE Transactions on Parallel and Distributed Systems
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Eclipse: Heterogeneous Multiprocessor Architecture for Flexible Media Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Message Passing Vs. Shared Address Space on a Clusters of SMPs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Parallelizing MPEG Video Encoding using Multiprocessors
SIBGRAPI '99 Proceedings of the XII Brazilian Symposium on Computer Graphics and Image Processing
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Strategies for Improving Data Locality in Embedded Applications
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
High Performance Memory Systems
High Performance Memory Systems
Analyzing On-Chip Communication in a MPSoC Environment
Proceedings of the conference on Design, automation and test in Europe - Volume 2
System Level Power Modeling and Simulation of High-End Industrial Network-on-Chip
Proceedings of the conference on Design, automation and test in Europe - Volume 3
Cycle-accurate power analysis for multiprocessor systems-on-a-chip
Proceedings of the 14th ACM Great Lakes symposium on VLSI
An integrated hardware/software approach for run-time scratchpad management
Proceedings of the 41st annual Design Automation Conference
Proceedings of the 41st annual Design Automation Conference
Communication Centric Architectures for Turbo-Decoding on Embedded Multiprocessors
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
2D data locality: definition, abstraction, and application
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Energy efficient synchronization techniques for embedded architectures
Proceedings of the 18th ACM Great Lakes symposium on VLSI
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Hi-index | 14.99 |
In today's multiprocessor SoCs (MPSoCs), parallel programming models are needed to fully exploit hardware capabilities and to achieve the 100 Gops/W energy efficiency target required for Ambient Intelligence Applications. However, mapping abstract programming models onto tightly power-constrained hardware architectures imposes overheads which might seriously compromise performance and energy efficiency. The objective of this work is to perform a comparative analysis of message passing versus shared memory as programming models for single-chip multiprocessor platforms. Our analysis is carried out from a hardware-software viewpoint: We carefully tune hardware architectures and software libraries for each programming model. We analyze representative application kernels from the multimedia domain, and identify application-level parameters that heavily influence performance and energy efficiency. Then, we formulate guidelines for the selection of the most appropriate programming model and its architectural support.