Address Code and Arithmetic Optimizations for Embedded Systems

Authors:
J. Ramanujam;Satish Krishnamurthy;Jinpyo Hong;Mahmut Kandemir
Affiliations:
Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA;Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA;Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA
Venue:
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Year:
2002

Citing 19
Cited 2

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Algorithms for address assignment in DSP code generation

Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design
Memory data organization for improved cache performance in embedded processor applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Static caching for incremental computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
High-level address optimization and synthesis techniques for data-transfer-intensive applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Analysis of high-level address code transformations for programmable processors

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
System-Level Memory Management for Weakly Parallel Image Processing

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Reducing Address Bus Transitions for Low Power Memory Mapping

EDTC '96 Proceedings of the 1996 European conference on Design and Test
Loop optimization for aggregate array computations

ICCL '98 Proceedings of the 1998 International Conference on Computer Languages
ADOPT: Efficient Hardware Address Generation in Distributed Memory Architectures

ISSS '96 Proceedings of the 9th international symposium on System synthesis
Local memory exploration and optimization in embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Address Generation Optimization for Embedded High-Performance Processors: A Survey

Journal of Signal Processing Systems
Hierarchical parallelization and optimization of high-order stencil computations on multicore clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important class of problems used widely in both the embedded systems and scientific domains perform memory intensive computations on large data sets. These data sets get to be typically stored in main memory, which means that the compiler needs to generate the address of a memory location in order to store these data elements and generate the same address again when they are subsequently retrieved. This memory address computation is quite expensive, and if it is not performed efficiently, the performance degrades significantly. In this paper, we have developed a new compiler approach for optimizing the memory performance of subscripted or array variables and their address generation in stencil problems that are common in embedded image processing and other applications. Our approach makes use of the observation that in all these stencils, most of the elements accessed are stored close to one other in memory. We try to optimize the stencil codes with a view of reducing both the arithmetic and the address computation overhead. The regularity of the access pattern and the reuse of data elements between successive iterations of the loop body means that there is a common sub-expression between any two successive iterations; these common sub-expressions are difficult to detect using state-of-the-art compiler technology. If we were to store the value of the common sub-expression in a scalar, then for the next iteration, the value in this scalar could be used instead of performing the computation all over again. This greatly reduces the arithmetic overhead. Since we store only one scalar in a register, there is almost no register pressure. Also all array accesses are now replaced by pointer dereferences, where the pointers are incremented after each iteration. This reduces the address computation overhead. Our solution is the only one so far to exploit both scalar conversion and common sub-expressions. Extensive experimental results on several codes show that our approach performs better than the other approaches.