Reducing memory requirements of nested loops for embedded systems

Authors:
J. Ramanujam;Jinpyo Hong;Mahmut Kandemir;A. Narayan
Affiliations:
Louisiana State University, Baton Rouge, LA;Louisiana State University, Baton Rouge, LA;Pennsylvania State University, State College, PA;Louisiana State University, Baton Rouge, LA
Venue:
Proceedings of the 38th annual Design Automation Conference
Year:
2001

Citing 18
Cited 20

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Counting solutions to Presburger formulas: how and why

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Background memory area estimation for multidimensional signal processing systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exact memory size estimation for array computations without loop unrolling

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Simultaneous reference allocation in code generation for dual data memory bank ASIPs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A Singular Loop Transformation Framework Based on Non-Singular Matrices

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Array Placement for Storage Size Reduction in Embedded Multimedia Systems

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors

Enhancing Compiler Techniques for Memory Energy Optimizations

EMSOFT '02 Proceedings of the Second International Conference on Embedded Software
Overcoming the "Memory Wall" by improved system design exploration and a link to process technology options

Proceedings of the 1st conference on Computing frontiers
Storage requirement estimation for optimized design of data intensive applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Behavioral-Level Performance and Power Exploration of Data-Intensive Applications Mapped on Programmable Processors

Journal of VLSI Signal Processing Systems
Memory optimization by counting points in integer transformations of parametric polytopes

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Mapping multi-dimensional signals into hierarchical memory organizations

Proceedings of the conference on Design, automation and test in Europe
Softexplorer: estimating and optimizing the power and energy consumption of a C program for DSP applications

EURASIP Journal on Applied Signal Processing
Computation of storage requirements for multi-dimensional signal processing applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Computation of the minimum data storage and applications in memory management for multimedia signal processing

Integrated Computer-Aided Engineering
Storage Estimation and Design Space Exploration Methodologies for the Memory Management of Signal Processing Applications

Journal of Signal Processing Systems
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications

Journal of Signal Processing Systems
Symbolic polynomial maximization over convex sets and its application to memory requirement estimation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Signal assignment to hierarchical memory organizations for embedded multidimensional signal processing systems

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures
Low power engineering

Embedded Systems Design
Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization

ACM Transactions on Architecture and Code Optimization (TACO)
Near-optimal and scalable intrasignal in-place optimization for non-overlapping and irregular access schemes

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A scalable and near-optimal representation of access schemes for memory management

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most embedded systems have limited amount of memory. In contrast, the memory requirements of code (in particular loops) running on embedded systems is significant. This paper addresses the problem of estimating the amount of memory needed for transfers of data in embedded systems. The problem of estimating the region associated with a statement or the set of elements referenced by a statement during the execution of the entire set of nested loops is analyzed. Aquantitative analysis of the number of elements referenced is presented; exact expressions for uniformly generated references and a close upper and lower bound for non-uniformly generated references are derived. In addition to presenting an algorithm that computes the total memory required, we discuss the effect of transformations on the lifetimes of array variables, i.e., the time between the first and last accesses to a given array location. A detailed analysis on the effect of unimodular transformations on data locality including the calculation of the maximum window size is discussed. The termmaximum window sizeis introduced and quantitative expressions are derived to compute the window size. The smaller the value of the maximum window size, the higher the amount of data locality in the loop.