Global register allocation at link time

Authors:
David W. Wall
Affiliations:
Digital Equipment Corporation
Venue:
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Year:
1986

Citing 6
Cited 70

Reduced instruction set computers

Communications of the ACM - Special section on computer architecture
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Performance of various computers using standard linear equations software in a Fortran environment

ACM SIGARCH Computer Architecture News
A portable machine-independent global optimizer--design and measurements

A portable machine-independent global optimizer--design and measurements

WISQ: a restartable architecture using queues

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The Mahler experience: using an intermediate language as the machine description

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Lisp on a Reduced-Instruction-Set Processor: Characterization and Optimization

Computer
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Register windows vs. register allocation

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Minimizing register usage penalty at procedure calls

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A portable global optimizer and linker

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A simple interprocedural register allocation algorithm and its effectiveness for LISP

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data buffering: run-time versus compile-time support

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Using registers to optimize cross-domain call performance

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Spill code minimization techniques for optimizing compliers

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Determining average program execution times and their variance

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
The Nonuniform Distribution of Instruction-Level and Machine Parallelism and its Effect on Performance

IEEE Transactions on Computers
Register allocation across procedure and module boundaries

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Architectural support for reduced register saving/restoring in single-window register files

ACM Transactions on Computer Systems (TOCS)
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Mapping concurrent programs to VLIW processors

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting program behavior using real or estimated profiles

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Flexible register management for sequential programs

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effect on RISC performance of register set size and structure versus code generation strategy

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Experience with a software-defined machine architecture

ACM Transactions on Programming Languages and Systems (TOPLAS)
Processor Architecture and Data Buffering

IEEE Transactions on Computers
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Zero-cost range splitting

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Accurate static branch prediction by value range propagation

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural register allocation for lazy functional languages

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
An experimental study of several cooperative register allocation and instruction scheduling strategies

Proceedings of the 28th annual international symposium on Microarchitecture
Efficient and language-independent mobile programs

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Fast, effective dynamic compilation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Demand-driven register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Whole-program optimization for time and space efficient threads

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Minimum cost interprocedural register allocation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hot cold optimization of large Windows/NT applications

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Call-cost directed register allocation

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Alias analysis of executable code

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Quality and speed in linear-scan register allocation

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scalable cross-module optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Experiences with Cooperating Register Allocation and Instruction Scheduling

International Journal of Parallel Programming
Java annotation-aware just-in-time (AJIT) complilation system

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
A Tree-Based Alternative to Java Byte-Codes

International Journal of Parallel Programming
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reducing the cost of branches by using registers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Generation and analysis of very long address traces

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The store-load address table and speculative register promotion

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
P-code and compiler portability: experience with a Modula-2 optimizing compiler

ACM SIGPLAN Notices
Optimization of available C compilers for the MC68HC11

ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Interprocedural register allocation for RISC machines

ACM-SE 30 Proceedings of the 30th annual Southeast regional conference
Reducing Memory Latency via Read-after-Read Memory Dependence Prediction

IEEE Transactions on Computers
Inter-task register-allocation for static operating systems

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Walk-Time Techniques: Catalyst for Architectural Change

Computer
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Reality-based optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative register promotion using Advanced Load Address Table (ALAT)

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Inter-procedural stacked register allocation for itanium® like architecture

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Named-State Register File: Implementation and Performance

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Register windows vs. register allocation

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Profile guided code positioning

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Predicting program behavior using real or estimated profiles

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Binary translation to improve energy efficiency through post-pass register re-allocation

Proceedings of the 4th ACM international conference on Embedded software
Optimal register reassignment for register stack overflow minimization

ACM Transactions on Architecture and Code Optimization (TACO)
Performance and security lessons learned from virtualizing the alpha processor

Proceedings of the 34th annual international symposium on Computer architecture
A practical interprocedural dominance algorithm

ACM Transactions on Programming Languages and Systems (TOPLAS)
The compiler as a static analysis tool

Proceedings of the 2007 ACM international conference on SIGAda annual international conference
Interprocedural Speculative Optimization of Memory Accesses to Global Variables

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Harmonia: a transparent, efficient, and harmonious dynamic binary translator targeting the Intel® architecture

Proceedings of the 8th ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	0.01

Visualization

Abstract

In previous work in global register allocation, the compiler colors a conflict graph constructed from liveness dataflow information, in order to allocate the same register to many variables that are not simultaneously live. If two procedures are in separately compiled modules, however, the compiler must do this allocation separately for each procedure. As a result, the two procedures might use different registers for the same global, or the same register for different locals.We can remove these problems if we delay the register allocation until link time. Our compiler produces object modules that can be linked and run without global register allocation, but includes with each object module a body of information describing how the module uses variables and procedures. A link-time register allocator then decides which variables are used most frequently, selects registers for them, and rewrites the code to reflect the decision that these variables reside in registers rather than in memory. Construction of the call graph allows us to use the same register for locals of procedures that are not simultaneously active, giving us most of the advantages of a full-scale coloring without the expense.When we use our method for 52 registers, our benchmarks speed up by 10 to 25 percent. Even with only 8 registers, the speedup can be nearly that large if we use previously collected profile information to guide the allocation. We cannot do much better, because programs whose variables all fit in registers rarely speed up by more than 30%. Moreover, profiling shows us that we usually remove 60% to 90% of the loads and stores of scalar variables that the program performs during its execution, and often much more.