Achieving high instruction cache performance with an optimizing compiler

Authors:
W. W. Hwu;P. P. Chang
Affiliations:
Coordinated Science Laboratory, University of Illinois, Urbana, IL;Coordinated Science Laboratory, University of Illinois, Urbana, IL
Venue:
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Year:
1989

Citing 24
Cited 82

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
Architectural tradeoffs in the design of MIPS-X

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Performance evaluation of on-chip register and cache organizations

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems

IEEE Transactions on Software Engineering
Code scheduling and register allocation in large basic blocks

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Trace selection for compiling large C application programs to microcode

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Organization of array data for concurrent memory access

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Inline function expansion for compiling C programs

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Cache evaluation and the impact of workload choice

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Postpass Code Optimization of Pipeline Constraints

ACM Transactions on Programming Languages and Systems (TOPLAS)
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Improving locality by critical working sets

Communications of the ACM
Register allocation by priority-based coloring

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Workbench for Computer Architects

IEEE Design & Test
Performance Trade-Offs for Microprocessor Cache Memories

IEEE Micro
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A Characterization of Processor Performance in the vax-11/780

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Inline function expansion for compiling C programs

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for branch target buffers

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
On reconfigurable on-chip data caches

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Subprogram Inlining: A Study of its Effects on Program Execution Time

IEEE Transactions on Software Engineering
Fast instruction cache performance evaluation using compile-time analysis

SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches

ACM Transactions on Computer Systems (TOCS)
Cache replacement with dynamic exclusion

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient simulation of caches under optimal replacement with applications to miss characterization

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On the conversion of indirect to direct recursion

ACM Letters on Programming Languages and Systems (LOPLAS)
Compile time instruction cache optimizations

ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Static branch frequency and program profile analysis

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Using branch handling hardware to support profile-driven optimization

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Corpus-based static branch prediction

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance issues in correlated branch prediction schemes

Proceedings of the 28th annual international symposium on Microarchitecture
The predictability of branches in libraries

Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evidence-based static branch prediction using machine learning

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hot cold optimization of large Windows/NT applications

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Resource-bounded partial evaluation

PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Code placement techniques for cache miss rate reduction

ACM Transactions on Design Automation of Electronic Systems (TODAES)
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Overlapping execution with transfer using non-strict execution for mobile programs

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Optimizing the Instruction Cache Performance of the Operating System

IEEE Transactions on Computers
Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

IEEE Transactions on Computers
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Control flow optimization for supercomputer scalar processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Architectural and compiler support for effective instruction prefetching: a cooperative approach

ACM Transactions on Computer Systems (TOCS)
New directions in compiler technology for embedded systems (embedded tutorial)

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Code layout optimizations for transaction processing workloads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes

International Journal of Parallel Programming
Software Trace Cache for Commercial Applications

International Journal of Parallel Programming
Predicting and Precluding Problems with Memory Latency

IEEE Micro
The Effect of Code Expanding Optimizations on Instruction Cache Design

IEEE Transactions on Computers
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors

IEEE Transactions on Computers
Code Positioning for VLIW Architectures

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
On the Performance of Fetch Engines Running DSS Workloads

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reality-based optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimized code restructuring of OS/2 executables

CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Optimal Code Placement of Embedded Software for Instruction Caches

EDTC '96 Proceedings of the 1996 European conference on Design and Test
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Size-Constrained Code Placement for Cache Miss Rate Reduction

ISSS '96 Proceedings of the 9th international symposium on System synthesis
Profile guided code positioning

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Profile-directed restructuring of operating system code

IBM Systems Journal
Software Trace Cache

IEEE Transactions on Computers
A first look at the interplay of code reordering and configurable caches

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
A non-uniform cache architecture for low power system design

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Optimizing instruction cache performance of embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
A cache-defect-aware code placement algorithm for improving the performance of processors

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Optimizing the performance of dynamically-linked programs

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
Spike: an optimizer for alpha/NT executables

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Multicore-aware hybrid code positioning to reduce worst-case execution time

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Optimal interprocedural program optimization: a new framework and its application

Optimal interprocedural program optimization: a new framework and its application
Improving TriMedia cache performance by profile guided code reordering

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories

Journal of Signal Processing Systems
A compiler framework for the reduction of worst-case execution times

Real-Time Systems
Improved procedure placement for set associative caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Leap scratchpads: automatic memory and cache management for reconfigurable logic

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Combining code reordering and cache configuration

ACM Transactions on Embedded Computing Systems (TECS)
A proper performance evaluation system that summarizes code placement effects

Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Dynamic profiling-based approach to identifying cost-effective refactorings

Information and Software Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

Increasing the execution power requires a high instruction issue bandwidth, and decreasing instruction encoding and applying some code improving techniques cause code expansion. Therefore, the instruction memory hierarchy performance has become an important factor of the system performance. An instruction placement algorithm has been implemented in the IMPACT-I (Illinois Microarchitecture Project using Advanced Compiler Technology - Stage I) C compiler to maximize the sequential and spatial localities, and to minimize mapping conflicts. This approach achieves low cache miss ratios and low memory traffic ratios for small, fast instruction caches with little hardware overhead. For ten realistic UNIX* programs, we report low miss ratios (average 0.5%) and low memory traffic ratios (average 8%) for a 2048-byte, direct-mapped instruction cache using 64-byte blocks. This result compares favorably with the fully associative cache results reported by other researchers. We also present the effect of cache size, block size, block sectoring, and partial loading on the cache performance. The code performance with instruction placement optimization is shown to be stable across architectures with different instruction encoding density.