Procedure placement using temporal-ordering information

Authors:
Nikolas Gloy;Michael D. Smith
Affiliations:
Appliant, Inc., Seattle, WA;Harvard Univ., Cambridge, MA
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1999

Citing 30
Cited 28

An automatic overlay generator

IBM Journal of Research and Development
A Case for Direct-Mapped Caches

Computer
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Expected I-cache miss rates via the gap model

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
An inter-reference gap model for temporal locality in program behavior

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hot cold optimization of large Windows/NT applications

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Wrong-path instruction prefetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance analysis using very large memory on the 64-bit AlphaServer system

Digital Technical Journal
Exploiting hardware performance counters with flow and context sensitive profiling

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Virtual Memory

ACM Computing Surveys (CSUR)
Cache Memories

ACM Computing Surveys (CSUR)
Improving locality by critical working sets

Communications of the ACM
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Temporal-Based Procedure Reordering for Improved Instruction Cache Performance

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Applications of randomness in system performance measurement

Applications of randomness in system performance measurement
Improving instruction locality with just-in-time code layout

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997

Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Code Positioning for VLIW Architectures

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Improving spatial locality of programs via data mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Buffering databse operations for enhanced instruction cache performance

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Procedure placement using temporal-ordering information: dealing with code size expansion

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Code placement for improving dynamic branch prediction accuracy

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A hierarchical model of data locality

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A novel instruction scratchpad memory optimization method based on concomitance metric

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
The Camino Compiler infrastructure

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Fast and efficient partial code reordering: taking advantage of dynamic recompilatior

Proceedings of the 5th international symposium on Memory management
Dynamic code management: improving whole program code locality in managed runtimes

Proceedings of the 2nd international conference on Virtual execution environments
The hardness of cache conscious data placement

Nordic Journal of Computing
Procedure placement using temporal-ordering information: Dealing with code size expansion

Journal of Embedded Computing - Cache exploitation in embedded systems
Dynamic round-robin task scheduling to reduce cache misses for embedded systems

Proceedings of the conference on Design, automation and test in Europe
Abstracting access patterns of dynamic memory using regular expressions

ACM Transactions on Architecture and Code Optimization (TACO)
HitME: low power Hit MEmory buffer for embedded systems

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Linux Kernel Compaction through Cold Code Swapping

Transactions on High-Performance Embedded Architectures and Compilers II
Layout transformations for heap objects using static access patterns

CC'07 Proceedings of the 16th international conference on Compiler construction
Studying microarchitectural structures with object code reordering

Proceedings of the Workshop on Binary Instrumentation and Applications
Instruction cache locking using temporal reuse profile

Proceedings of the 47th Design Automation Conference
Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Improved procedure placement for set associative caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Exploiting statistical information for implementation of instruction scratchpad memory in embedded system

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimization of dense matrix multiplication on IBM cyclops-64: challenges and experiences

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Steganography for executables and code transformation signatures

ICISC'04 Proceedings of the 7th international conference on Information Security and Cryptology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction cache performance is important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate and the instruction working set size during execution. This means that the performance of an executable can be improved by applying a code-placement algorithm that minimizes instruction cache conflicts and improves spatial locality. We describe an algorithm for procedure placement, one type of code placement, that signicantly differs from previous approaches in the type of information used to drive the placement algorithm. In particular, we gather temporal-ordering information that summarizes the interleaving of procedures in a program trace. Our algorithm uses this information along with cache configuration and procedure size information to better estimate the conflict cost of a potential procedure ordering. It optimizes the procedure placement for single level and multilevel caches. In addition to reducing instruction cache conflicts, the algorithm simultaneously minimizes the instruction working set size of the program. We compare the performance of our algorithm with a particularly successful procedure-placement algorithm and show noticeable improvements in the instruction cache behavior, while maintaining the same instruction working set size.