Compiler-controlled memory

Authors:
Keith D. Cooper;Timothy J. Harvey
Affiliations:
Computer Science Department, Rice University, 6100 Main Street, MS 132, Houston, Texas;Computer Science Department, Rice University, 6100 Main Street, MS 132, Houston, Texas
Venue:
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Year:
1998

Citing 20
Cited 38

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Constructing the Procedure Call Multigraph

IEEE Transactions on Software Engineering
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Register allocation via graph coloring

Register allocation via graph coloring
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Resource allocation in a high clock rate microprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Iterated register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of memory consistency models for shared-memory systems with ILP processors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A new algorithm for partial redundancy elimination based on SSA form

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Spill code minimization via interference region spilling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register promotion in C programs

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Computer Methods for Mathematical Computations

Computer Methods for Mathematical Computations
Cross-Loop Reuse Analysis and Its Application to Cache Optimizations

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing

Ada+SQL—an overview

Proceedings of the 1999 annual ACM SIGAda international conference on Ada
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Storage allocation for embedded processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Cool-cache for hot multimedia

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

IEEE Transactions on Computers
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures

IEEE Transactions on Computers
A Framework for Parallelizing Load/Stores on Embedded Processors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Reordering Memory Bus Transactions for Reduced Power Consumption

LCTES '00 Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems
A Framework for Loop Distribution on Limited On-Chip Memory Processors

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Controlled Reconfigurable On-Chip Memory for High Performance Computing

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
FlexCache: A Framework for Flexible Compiler Generated Data Caching

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
SCIMA: A Novel Architecture for High Performance Computing

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Cool-Cache: A compiler-enabled energy efficient data caching framework for embedded/multimedia processors

ACM Transactions on Embedded Computing Systems (TECS)
Data compression for improving SPM behavior

Proceedings of the 41st annual Design Automation Conference
Hardware-managed register allocation for embedded processors

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fast, predictable and low energy memory references through architecture-aware compilation

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
BB-GC: Basic-Block Level Garbage Collection

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Differential register allocation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Memory Coloring: A Compiler Approach for Scratchpad Memory Management

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Efficient Use of Invisible Registers in Thumb Code

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Compiler Optimizations to Reduce Security Overhead

Proceedings of the International Symposium on Code Generation and Optimization
Parallelizing load/stores on dual-bank memory embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Allocating architected registers through differential encoding

ACM Transactions on Programming Languages and Systems (TOPLAS)
DRDU: A data reuse analysis technique for efficient scratch-pad memory management

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Software controlled memory layout reorganization for irregular array access patterns

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient dynamic heap allocation of scratch-pad memory

Proceedings of the 7th international symposium on Memory management
Compiler driven data layout optimization for regular/irregular array access patterns

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Access pattern-based code compression for memory-constrained systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Compiler-directed scratchpad memory management via graph coloring

ACM Transactions on Architecture and Code Optimization (TACO)
Adaptive scratch pad memory management for dynamic behavior of multimedia applications

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Using data compression for increasing memory system utilization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Reducing memory space consumption through dataflow analysis

Computer Languages, Systems and Structures
A compile-time managed multi-level register file hierarchy

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

ACM Transactions on Computer Systems (TOCS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compiler's ability to improve memory behavior is its imperfect knowledge about the run-time behavior of the program. The compiler cannot completely predict runtime access patterns.There is an exception to this rule. During the register allocation phase, the compiler often must insert substantial amounts of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program.Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs---a small compiler-controlled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierarchy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traffic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such memories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips.This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocation's coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.