Efficient procedure mapping using cache line coloring

Authors:
Amir H. Hashemi;David R. Kaeli;Brad Calder
Affiliations:
Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA;Dept. of Electrical and Computer Engineering, Northeastern University, Boston, MA;Dept. of Computer Science and Engineering, University of California, San Diego, La Jolla, CA
Venue:
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Year:
1997

Citing 19
Cited 34

Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Predicting program behavior using real or estimated profiles

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Expected I-cache miss rates via the gap model

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The predictability of branches in libraries

Proceedings of the 28th annual international symposium on Microarchitecture
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Properties of the working-set model

Communications of the ACM
Issues in Trace-Driven Simulation

Performance Evaluation of Computer and Communication Systems, Joint Tutorial Papers of Performance '93 and Sigmetrics '93
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Code Reorginazation for Instruction Caches

Code Reorginazation for Instruction Caches
Analysis of cache replacement-algorithms

Analysis of cache replacement-algorithms
Efficient analysis of caching systems

Efficient analysis of caching systems

Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Watermarking techniques for intellectual property protection

DAC '98 Proceedings of the 35th annual Design Automation Conference
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Overlapping execution with transfer using non-strict execution for mobile programs

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance

IEEE Transactions on Computers - Special issue on cache memory and related problems
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
Reducing cache misses using hardware and software page placement

ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Architectural and compiler support for effective instruction prefetching: a cooperative approach

ACM Transactions on Computer Systems (TOCS)
Offline program re-mapping to improve branch prediction efficiency in embedded systems

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Code layout optimizations for transaction processing workloads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Software Trace Cache for Commercial Applications

International Journal of Parallel Programming
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiling for instruction cache performance on a multithreaded architecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Call graph prefetching for database applications

ACM Transactions on Computer Systems (TOCS)
Buffering databse operations for enhanced instruction cache performance

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Procedure placement using temporal-ordering information: dealing with code size expansion

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Software Trace Cache

IEEE Transactions on Computers
A non-uniform cache architecture for low power system design

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Optimizing instruction cache performance of embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
A cache-defect-aware code placement algorithm for improving the performance of processors

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior

Proceedings of the 5th international symposium on Memory management
Dynamic code management: improving whole program code locality in managed runtimes

Proceedings of the 2nd international conference on Virtual execution environments
Procedure placement using temporal-ordering information: Dealing with code size expansion

Journal of Embedded Computing - Cache exploitation in embedded systems
Code reordering on limited branch offset

ACM Transactions on Architecture and Code Optimization (TACO)
External memory page remapping for embedded multimedia systems

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Blind Optimization for Exploiting Hardware Features

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Cache line reservation: exploring a scheme for cache-friendly object allocation

CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Improving TriMedia cache performance by profile guided code reordering

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories

Journal of Signal Processing Systems
Improved procedure placement for set associative caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Automatic code overlay generation and partially redundant code fetch elimination

ACM Transactions on Architecture and Code Optimization (TACO)
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Combining code reordering and cache configuration

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory eflectively. Both hardware and aoftware approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques.In this paper we present a link-time procedure mapping algorithm which can significantly improve the eflectiveness of the instruction cache. Our algorithm produces an improved program layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size, cache line size, and call graph. We use cache line coloring to guide the procedure mapping, indicating which cache lines to avoid when placing a procedure in the program layout. Our algorithm reduces on average the instruction cache miss rate by 40% over the original mapping and by 17% over the mapping algorithm of Pettis and Hansen [12].