Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications

Authors:
Victor De La Luz;Mahmut Kandemir
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 38
Cited 4

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Compiling for numa parallel machines

Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The meeting graph: a new model for loop cyclic register allocation

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
System partitioning to maximize sleep time

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Non-singular data transformations: definition, validity and applications

ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A hyperplane based approach for optimizing spatial locality in loop nests

ICS '98 Proceedings of the 12th international conference on Supercomputing
Advanced compiler design and implementation

Advanced compiler design and implementation
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Memory controller policies for DRAM power management

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Design of High-Performance Microprocessor Circuits

Design of High-Performance Microprocessor Circuits
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems

CC '02 Proceedings of the 11th International Conference on Compiler Construction
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Memory Hierarchy Management for Iterative Graph Structures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Cache management by the compiler

Cache management by the compiler
Code generation and optimization for embedded digital signal processors

Code generation and optimization for embedded digital signal processors
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Exploiting bank locality in multi-bank memories

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Combining source-to-source transformations and processor instruction set extensions for the automated design-space exploration of embedded systems

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Code transformation and instruction set extension

ACM Transactions on Embedded Computing Systems (TECS)
Optimizing local memory allocation and assignment through a decoupled approach

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Abstract--One of the key challenges facing computer architects and compiler writers is the increasing discrepancy between processor cycle times and main memory access times. To alleviate this problem in array-intensive embedded signal and video processing applications, compilers may employ either control-centric transformations that change data access patterns of nested loops or data-centric transformations that modify memory layouts of multidimensional arrays. Most of the memory layout optimizations proposed so far either modify the layout of each array independently or are based on explicit data reorganizations at runtime. This paper focuses on a compiler technique, called array regrouping, that automatically maps multiple arrays into a single data (array) space to improve data access pattern. We present a mathematical framework that enables us to systematically derive suitable mappings for a given array-intensive embedded application. The framework divides the arrays accessed in a given program into several groups and each group is independently layout-transformed to improve spatial locality and reduce the number of conflict misses. As compared to the previous approaches, the proposed technique makes two new contributions: 1) It presents a graph based formulation of the array regrouping problem and 2) it demonstrates potential benefits of this aggressive array-regrouping strategy in optimizing behavior of embedded systems. Extensive experimental results demonstrate significant improvements in cache miss rates and execution times. An important advantage of this approach over the previous techniques that target conflict misses is that it reduces conflict misses without increasing the data space requirements of the application being optimized. This is a very desirable property in many embedded/portable environments where data space requirements determine the minimum physical memory capacity. In addition to performance related issues, with the increased use of embedded/portable devices, improving energy efficiency of applications is becoming a critical issue. To develop a truly energy-efficient system, energy constraints should be taken into account early in the design process, i.e., at the source level in software compilation and behavioral level in hardware compilation. Source-level optimizations are particularly important in data-dominated media applications. In this paper, we also show how our array regrouping strategy increases energy savings from using multiple low-power operating modes provided in current memory modules. Using a set of array-intensive benchmarks, we observe significant savings in memory system energy.