MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
StormWatch: a tool for visualizing memory system protocols
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Using hardware performance monitors to isolate memory bottlenecks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Introducing the IA-64 Architecture
IEEE Micro
Itanium Processor Microarchitecture
IEEE Micro
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Data Cache Design Considerations for the Itanium® 2 Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
POWER4 system microarchitecture
IBM Journal of Research and Development
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks
Proceedings of the 19th annual international conference on Supercomputing
Using Dynamic Tracing Sampling to Measure Long Running Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques
ACM Transactions on Architecture and Code Optimization (TACO)
A dynamic optimization framework for OpenMP
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Discovery of locality-improving refactorings by reuse path analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Processor speed continues to increase faster than the speed of access to main memory, making effective use of memory caches more important. Information about an applicationýs interaction with the cache is therefore critical to performance tuning. To be most useful, tools that measure this information should relate it to the source code level data structures in an application. We describe how to gather such information by using hardware performance counters to sample cache miss addresses, and present a new tool named Cache Scope that does this using the Intel Itanium 2 performance monitors. We present experimental results concerning Cache Scopeýs accuracy and perturbation of cache behavior. We also describe a case study of using Cache Scope to tune two applications, achieving 24% and 19% reductions in running time.