Performance-Measurement Tools in a Multiprocessor Environment
IEEE Transactions on Computers
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Fast breakpoints: design and implementation
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Analyzing and visualizing performance of memory hierarchies
Parallel computer systems
Performance of parallel applications on a shared-memory multiprocessor system
Parallel computer systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Aspects of Cache Memory and Instruction
Aspects of Cache Memory and Instruction
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
Latency hiding techniques such as multilevel cache hierarchies yield high performance when applications map well onto hierarchy implementations, but performance can suffer drastically when they do not. Identifying and reducing mismatches between an application and the memory hierarchy is difficult without insight into the actual behavior of the hardware implementation. We advocate the use of hardware event counters, as a cheap, effective and practical way to tune applications for a given hardware platform. We take a case study approach, focussing on the counters available on the SPARCcenter 2000, a 20 processor, shared-bus based multiprocessor. We describe the tools we built to relate hardware event counts to user applications and give examples to illustrate how these tools are useful in practice. We conclude with a critique of the current hardware counters, offering a user's perspective on how they could be redesigned to be more effective.