A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Register allocation via graph coloring
Register allocation via graph coloring
Compiling for numa parallel machines
Compiling for numa parallel machines
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Energy-oriented compiler optimizations for partitioned memory architectures
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Memory controller policies for DRAM power management
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Automatic computation and data decomposition for multiprocessors
Automatic computation and data decomposition for multiprocessors
Translation validation for PRES+ models of parallel behaviours via an FSMD equivalence checker
VDAT'12 Proceedings of the 16th international conference on Progress in VLSI Design and Test
Hi-index | 0.00 |
The next generation embedded architectures are expected to accommodate multiple processors on the same chip. While this makes interprocessor communication less costly as compared to traditional high-end parallel machines, it also makes off-chip requests very costly. In particular, frequent off-chip memory accesses do not only increase execution cycles but also increase overall power consumption. One way of alleviating this power problem is to divide the off-chip memory into multiple banks, each of which can be power-controlled independently using low-power operating modes.In this article, we focus on a multiprocessor-system-on-a-chip (MPSoC) architecture with a banked memory system, and show how code and data optimizations can help us reduce memory energy consumption for embedded applications with regular data access patterns, for example, those from the embedded image and video processing domain. This is achieved by ensuring bank locality, which means that each processor localizes its accesses into a small set of banks in a given time period. We present a mathematical formulation of the bank locality problem. Our formulation is based on constructing a set of matrix equations that capture the mappings between the data, computation, processor, and memory bank spaces. Based on this formulation, we propose a heuristic solution to the bank locality problem under different scenarios. Our solution involves an iterative process through which we try to satisfy as many matrix constraints as possible; the unsatisfied constraints represent the degree of degradation in bank locality. Finally, we report extensive experimental results showing the effectiveness of our strategy in practice. Our results show that the proposed solution improves bank locality significantly, and reduces the overall memory system energy consumption by up to 34% over an approach that makes use of the low-power modes but does not employ our strategy.