An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The High Performance FORTRAN Handbook
The High Performance FORTRAN Handbook
Compiling for Distributed Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
Finding Legal Reordering Transformations Using Mappings
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Networks on Silicon: Combining Best-Effort and Guaranteed Services
Proceedings of the conference on Design, automation and test in Europe
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 47th Design Automation Conference
A lifetime aware buffer assignment method for streaming applications on DRAM/PRAM hybrid memory
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Chip multiprocessors are gaining popularity as they are very suitable for data-intensive embedded and high-end processing. In particular, array-intensive embedded image and video applications can benefit a lot from these architectures due to coarse-grain parallelization they offer. However, if not optimized, interprocessor communication can be a major energy consumer. Focusing on a distributed memory chip multiprocessor architecture and array-intensive embedded applications, this paper proposes a compiler-based communication minimization strategy based on data replication. The proposed scheme replicates shared data items across the memories of the processors in a controlled fashion (i.e., under a memory limit), with the goal of eliminating the otherwise necessary interprocessor communication.