Data remapping for design space optimization of embedded memory systems

Authors:
Rodric M. Rabbah;Krishna V. Palem
Affiliations:
Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2003

Citing 30
Cited 20

The C programming language

The C programming language
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Stride directed prefetching in scalar processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Memory bandwidth limitations of future microprocessors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Points-to analysis in almost linear time

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Memory systems and pipelined processors

Memory systems and pipelined processors
Memory data organization for improved cache performance in embedded processor applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automatic and efficient evaluation of memory hierarchies for embedded systems

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Automated data-member layout of heap objects to improve memory-hierarchy performance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Design space optimization of embedded memory systems via data remapping

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Cover story: the future of the microprocessor business

IEEE Spectrum - The future of the microprocessor business
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

Journal of VLSI Signal Processing Systems
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
Advanced Data Layout Optimization for Multimedia Applications

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Automatic Architectural Synthesis of VLIW and EPIC Processors

Proceedings of the 12th international symposium on System synthesis

Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Practical Structure Layout Optimization and Advice

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Multi-compilation: capturing interactions among concurrently-executing applications

Proceedings of the 3rd conference on Computing frontiers
Restructuring field layouts for embedded memory systems

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Online performance auditing: using hot optimizations without getting burned

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Data trace cache: an application specific cache architecture

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Offline compression for on-chip ram

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Forma: A framework for safe automatic array reshaping

ACM Transactions on Programming Languages and Systems (TOPLAS)
MPADS: memory-pooling-assisted data splitting

Proceedings of the 7th international symposium on Memory management
Abstracting access patterns of dynamic memory using regular expressions

ACM Transactions on Architecture and Code Optimization (TACO)
Eliminating the call stack to save RAM

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Composition-based Cache simulation for structure reorganization

Journal of Systems Architecture: the EUROMICRO Journal
Layout transformations for heap objects using static access patterns

CC'07 Proceedings of the 16th international conference on Compiler construction
Customized placement for high performance embedded processor caches

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
On-the-fly structure splitting for heap objects

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A framework for compiler driven design space exploration for embedded system customization

ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Trimaran: an infrastructure for research in instruction-level parallelism

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing high-performance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access, and others. The proposed methodology will become increasingly important, especially as the needs for application specific embedded architectures become prevalent. In addition, we demonstrate the wide applicability of data remapping using several existing microprocessors, such as the Pentium and UltraSparc. Namely, we show that remapping can achieve a performance improvement of 20% on the average. Similarly, for a parametric research HPL-PD microprocessor, which characterizes the new Itanium machines, we achieve a performance improvement of 28% on average. All of our results are achieved using applications from the DIS, Olden and SPEC2000 suites of integer and floating point benchmarks.