Static branch frequency and program profile analysis
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache behavior prediction by abstract interpretation
Science of Computer Programming
ACM Computing Surveys (CSUR)
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Static load classification for improving the value predictability of data-cache misses
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Helper threads via virtual multithreading on an experimental itanium® 2 processor-based platform
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Ubiquitous memory introspection
Proceedings of the International Symposium on Code Generation and Optimization
Latency-tolerant software pipelining in a production compiler
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Enhancing last-level cache performance by block bypassing and early miss determination
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Identifying the sources of cache misses in Java programs without relying on hardware counters
Proceedings of the 2012 international symposium on Memory Management
Hi-index | 0.01 |
The effective use of processor caches is crucial to theperformance of applications. It has been shown that cachemisses are not evenly distributed throughout a program.In applications running on RISC-style processors, a smallnumber of delinquent load instructions are responsible formost of the cache misses. Identification of delinquent loadsis the key to the success of many cache optimization andprefetching techniques. In this paper, we propose a methodfor identifying delinquent loads that can be implemented atcompile time. Our experiments over eighteen benchmarksfrom the SPEC suite shows that our proposed scheme is stableacross benchmarks, inputs, and cache structures, identifyingan average of 10% of the total number of loads in thebenchmarks we tested that account for over 90% of all datacache misses. As far as we know, this is the first time a techniquefor static delinquent load identification with such alevel of precision and coverage has been reported. Whilecomparable techniques can also identify load instructionsthat cover 90% of all data cache misses, they do so by selectingover 50% of all load instructions in the code, resultingin a high number of false positives. If basic block profilingis used in conjunction with our heuristic, then our resultsshow that it is possible to pin down just 1.3% of theload instructions that account for 82% of all data cachemisses.