ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software—Practice & Experience
StormWatch: a tool for visualizing memory system protocols
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Using hardware performance monitors to isolate memory bottlenecks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Locality-Based Online Trace Compression
IEEE Transactions on Computers
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
Performance Tool Support for MPI-2 on Linux
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Identifying and Exploiting Spatial Regularity in Data Memory References
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Owl: next generation system monitoring
Proceedings of the 2nd conference on Computing frontiers
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks
Proceedings of the 19th annual international conference on Supercomputing
Using Dynamic Tracing Sampling to Measure Long Running Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
ALITER: an asynchronous lightweight instrumentation tool for event recording
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques
ACM Transactions on Architecture and Code Optimization (TACO)
An efficient single-pass trace compression technique utilizing instruction streams
ACM Transactions on Modeling and Computer Simulation (TOMACS)
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
A productivity centered application performance tuning framework
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Prediction and trace compression of data access addresses through nested loop recognition
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Open | SpeedShop: An open source infrastructure for parallel performance analysis
Scientific Programming - Large-Scale Programming Tools and Environments
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing
Journal of Parallel and Distributed Computing
A Holistic Approach towards Automated Performance Analysis and Tuning
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Locality behavior of parallel and sequential algorithms for irregular graph problems
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Exploiting stability to reduce time-space cost for memory tracing
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
DeFT: Design space exploration for on-the-fly detection of coherence misses
ACM Transactions on Architecture and Code Optimization (TACO)
A study on the locality behavior of minimum spanning tree algorithms
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Supporting cache locality optimization with a toolset
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Cache optimizations for iterative numerical codes aware of hardware prefetching
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Collecting and exploiting cache-reuse metrics
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Detecting application load imbalance on high end massively parallel systems
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Elastic and scalable tracing and accurate replay of non-deterministic events
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
In this paper we present SIGMA (Simulation Infrastructure to Guide Memory Analysis), a new data collection framework and family of cache analysis tools. The SIGMA environment provides detailed cache information by gathering memory reference data using software-based instrumentation. This infrastructure can facilitate quick probing into the factors that influence the performance of an application by highlighting bottleneck scenarios including: excessive cache/TLB misses and inefficient data layouts. The tool can also assist in perturbation analysis to determine performance variations caused by changes to architecture or program. Our validation tests using the SPEC Swim benchmark show that most of the performance metrics obtained with SIGMA are within 1% of the metrics obtained with hardware performance counters, with the advantage that SIGMA provides performance data on a data structure level, as specified by the programmer.