Static Identification of Delinquent Loads

Authors:
Vlad-Mihai Panait;Amit Sasturkar;Weng-Fai Wong
Affiliations:
-;-;-
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2004

Citing 9
Cited 8

Static branch frequency and program profile analysis

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs

Proceedings of the 28th annual international symposium on Microarchitecture
Examination of a memory access classification scheme for pointer-intensive and numeric programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache behavior prediction by abstract interpretation

Science of Computer Programming
Data prefetch mechanisms

ACM Computing Surveys (CSUR)
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Static load classification for improving the value predictability of data-cache misses

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation

Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Compiler orchestrated prefetching via speculation and predication

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Instruction Based Memory Distance Analysis and its Application

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Ubiquitous memory introspection

Proceedings of the International Symposium on Code Generation and Optimization
Latency-tolerant software pipelining in a production compiler

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Enhancing last-level cache performance by block bypassing and early miss determination

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Targeted data prefetching

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Identifying the sources of cache misses in Java programs without relying on hardware counters

Proceedings of the 2012 international symposium on Memory Management

Quantified Score

Hi-index	0.01

Visualization

Abstract

The effective use of processor caches is crucial to theperformance of applications. It has been shown that cachemisses are not evenly distributed throughout a program.In applications running on RISC-style processors, a smallnumber of delinquent load instructions are responsible formost of the cache misses. Identification of delinquent loadsis the key to the success of many cache optimization andprefetching techniques. In this paper, we propose a methodfor identifying delinquent loads that can be implemented atcompile time. Our experiments over eighteen benchmarksfrom the SPEC suite shows that our proposed scheme is stableacross benchmarks, inputs, and cache structures, identifyingan average of 10% of the total number of loads in thebenchmarks we tested that account for over 90% of all datacache misses. As far as we know, this is the first time a techniquefor static delinquent load identification with such alevel of precision and coverage has been reported. Whilecomparable techniques can also identify load instructionsthat cover 90% of all data cache misses, they do so by selectingover 50% of all load instructions in the code, resultingin a high number of false positives. If basic block profilingis used in conjunction with our heuristic, then our resultsshow that it is possible to pin down just 1.3% of theload instructions that account for 82% of all data cachemisses.