Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Simulation of multiprocessors: accuracy and performance
Simulation of multiprocessors: accuracy and performance
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Fast volume rendering using a shear-warp factorization of the viewing transformation
SIGGRAPH '94 Proceedings of the 21st annual conference on Computer graphics and interactive techniques
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
Analyzing and Tuning Memory Performance in Sequential and Parallel Programs
Analyzing and Tuning Memory Performance in Sequential and Parallel Programs
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Performance debugging shared memory parallel programs using run-time dependence analysis
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Static timing analysis of embedded software
DAC '97 Proceedings of the 34th annual Design Automation Conference
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Using GODIVA for data flow analysis
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Facilitating level three cache studies using set sampling
Proceedings of the 32nd conference on Winter simulation
More enhancements of the simplescalar tool set
ACM SIGARCH Computer Architecture News
A proposal for a new hardware cache monitoring architecture
Proceedings of the 2002 workshop on Memory system performance
A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Detailed cache simulation for detecting bottleneck, miss reason and optimization potentialities
valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
CMP Cache Architecture and the OpenMP Performance
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Performance advantage of reconfigurable cache design on multicore processor systems
International Journal of Parallel Programming
Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations
Transactions on High-Performance Embedded Architectures and Compilers I
YACO: a user conducted visualization tool for supporting cache optimization
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
CacheIn: a toolset for comprehensive cache inspection
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Comprehensive cache inspection with hardware monitors
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
A profiling tool for detecting cache-critical data structures
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 4.10 |
To improve program memory performance, programmers and compiler writers can transform the application so that its memory-referencing behavior better exploits the memory hierarchy. The challenge in achieving these program transformations is overcoming the difficulty of statically analyzing or reasoning about an application's referencing behavior and interactions. In addition, many performance-monitoring tools collect high-level information that is inadequately detailed to analyze specific memory performance bugs. We describe MemSpy, a performance-monitoring tool we designed to help programmers discern where and why memory bottlenecks occur. MemSpy guides programmers toward program transformations that improve memory performance through detailed statistics on cache-miss causes and frequency. Because of the natural link between data-reference patterns and memory performance, MemSpy helps programmers comprehend data structure and code segment interactions by displaying statistics in terms of both the program's data and code structures, rather than for code structures alone