The POWER2 performance monitor
IBM Journal of Research and Development
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
Analyzing overheads and scalability characteristics of openMP applications
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
YACO: a user conducted visualization tool for supporting cache optimization
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Enabling Data Structure Oriented Performance Analysis with Hardware Performance Counter Support
Euro-Par 2008 Workshops - Parallel Processing
Hi-index | 0.00 |
A poor cache behavior can significantly prohibit achieving high speedup and scalability of parallel applications. This means optimizing a program with respect to cache locality can potentially introduce considerable performance gain. As a consequence, programmers usually perform cache locality optimization for acquiring the expected performance of their applications. Within this work, we developed a data profiling tool dprof with the goal of supporting the users in this task by allowing them to detect the optimization targets in their programs. In contrast to similar tools which mostly focus on code regions, we address data structures because they are the direct objects that programmers have to work with. Based on the Performance Monitoring Unit (PMU) provided by modern processors, dprof is capable of finding cache-critical variables, arrays, or even a segment of an array. It can also locate theses access hotspots to the most concrete position such as individual functions and code lines. This feature allows the user to apply dprof for efficient cache optimization.