Improving the cache locality of memory allocation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The POWER2 performance monitor
IBM Journal of Research and Development
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient and flexible value sampling
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Optimizing Data Locality for SCI-Based PC-Clusters with the SMiLE Monitoring Approach
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Simulation Tool for Evaluating Shared Memory Systems
ANSS '03 Proceedings of the 36th annual symposium on Simulation
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Simulation as a tool for optimizing memory accesses on NUMA machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Monitoring cache behavior on parallel SMP architectures and related programming tools
Future Generation Computer Systems
Monitoring cache behavior on parallel SMP architectures and related programming tools
Future Generation Computer Systems
Performance cockpit: an extensible GUI platform for performance tools
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
The analysis of the memory access behavior of applications, an essential step for a successful cache optimization, is a complex task. It needs to be supported with appropriate tools and monitoring facilities. Currently, however, users can only rely on either simulation based approaches, which deliver a large degree of detail but are restricted in their applicability, or on hardware counters embedded into processors, which allow to keep track of very few, mostly global events and hence only provide limited data.In this work a proposal for novel hardware monitoring facility is presented which exhibits both the details of traditional simulations and the low--overhead of hardware counters. Like the latter approach, it is also targeted towards an implementation within the processor for a direct and non--intrusive access to caches and memory busses. Unlike traditional counters, however, it delivers a detailed picture of the complete memory access behavior of applications. This is achieved by generating so--called memory access histograms, which show access frequencies in relation to the applications address space. Such spatial memory access information can then be used for efficient program optimization by focusing on the code and data segments which were found to exhibit a poor cache behavior.