MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
SIP: Performance Tuning through Source Code Interdependence
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MICA: A Memory and Interconnect Simulation Environment for Cache-Based Architectures
SS '00 Proceedings of the 33rd Annual Simulation Symposium
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Off-loading application controlled data prefetching in numerical codes for multi-core processors
International Journal of Computational Science and Engineering
Hi-index | 0.00 |
Cache optimizations use code transformations to increase the locality of memory accesses and use prefetching techniques to hide latency. For best performance, hardware prefetching units of processors should be complemented with software prefetch instructions. A cache simulation enhanced with a hardware prefetcher is presented to run code for a 3D multigrid solver. Thus, cache misses not predicted can be handled via insertion of prefetch instructions. Additionally, Interleaved Block Prefetching (IBPF), is presented. Measurements show its potential.