Unified management of registers and cache using liveness and cache bypass
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
On the Stability of Temporal Data Reference Profiles
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Reuse Distance-Based Cache Hint Selection
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Feedback-directed memory disambiguation through store distance analysis
Proceedings of the 20th annual international conference on Supercomputing
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluation techniques for storage hierarchies
IBM Systems Journal
Static reuse distances for locality-based optimizations in MATLAB
Proceedings of the 24th ACM International Conference on Supercomputing
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
Compiling for niceness: mitigating contention for QoS in warehouse scale computers
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Hi-index | 0.00 |
String operations such as memcpy, memset and memcmp account for a nontrivial amount of Google datacenter resources. String operations hurt processor cache efficiency when the data accessed is not reused shortly thereafter. Such cache pollution can be avoided by using nontemporal memory access to bypass L2/L3 caches. As reuse distance varies greatly across different memcpy static call contexts in the same program, an efficient solution needs to be call context sensitive. We propose a novel solution to this problem using the page protection mechanism to measure reuse distance and the GCC feedback directed optimization mechanism to generate nontemporal memory access instructions at the appropriate static code contexts. First, the compiler inserts instrumentation for calls to string operations. Then a run time library measures reuse distance using the page protection mechanism during a representative profiling run. The compiler finally generates calls to specialized string operations that use nontemporal operations for the arguments with large reuse distance. We present a full implementation and initial results including speedup on large datacenter applications.