Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
ICS '01 Proceedings of the 15th international conference on Supercomputing
On using SCALEA for performance analysis of distributed and parallel programs
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Modeling and detecting performance problems for distributed and parallel programs with JavaPSL
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Performance Contracts: Predicting and Monitoring Grid Application Behavior
GRID '01 Proceedings of the Second International Workshop on Grid Computing
A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
SCALEA: A Performance Analysis Tool for Distributed and Parallel Programs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Review of Performance Analysis Tools for MPI Parallel Programs
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Performance Analysis for MPI Applications with SCALEA
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalable analysis techniques for microprocessor performance counter metrics
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Asserting performance expectations
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
SFCGen: A framework for efficient generation of multi-dimensional space-filling curves by recursion
ACM Transactions on Mathematical Software (TOMS)
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Diagnosing performance overheads in the xen virtual machine environment
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
How Well Can Simple Metrics Represent the Performance of HPC Applications?
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Using Dynamic Tracing Sampling to Measure Long Running Programs
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Reliability challenges in large systems
Future Generation Computer Systems
Evaluating fragment construction policies for SDT systems
Proceedings of the 2nd international conference on Virtual execution environments
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An intra-task dvfs technique based on statistical analysis of hardware events
Proceedings of the 4th international conference on Computing frontiers
Performance metrics and ontologies for Grid workflows
Future Generation Computer Systems
Dynamic compilation: the benefits of early investing
Proceedings of the 3rd international conference on Virtual execution environments
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
Using hardware performance monitors to understand the behavior of java applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Operating system profiling via latency analysis
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
CAMP: a common API for measuring performance
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Dynamic tiling for effective use of shared caches on multithreaded processors
International Journal of High Performance Computing and Networking
Processor hardware counter statistics as a first-class system resource
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Feedback-controlled resource sharing for predictable eScience
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Online Phase-Adaptive Data Layout Selection
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Scalable Implementation of Efficient Locality Approximation
Languages and Compilers for Parallel Computing
Producing wrong data without doing anything obviously wrong!
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Blind Optimization for Exploiting Hardware Features
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Journal of Parallel and Distributed Computing
A Methodology to Characterize Critical Section Bottlenecks in DSM Multiprocessors
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A concurrent dynamic analysis framework for multicore hardware
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Reliability challenges in large systems
Future Generation Computer Systems
Memory hierarchy optimizations and performance bounds for sparse ATAx
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
HieraAnalyses – a tool for hierarchical analysis of parallel programs
International Journal of High Performance Systems Architecture
Efficient hardware-based nonintrusive dynamic application profiling
ACM Transactions on Embedded Computing Systems (TECS)
Should we worry about memory loss?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
autopin: automated optimization of thread-to-core pinning on multicore systems
Transactions on high-performance embedded architectures and compilers III
Performance modeling for systematic performance tuning
State of the Practice Reports
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Automatic tuning of PDGEMM towards optimal performance
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A tool to display array access patterns in OpenMP programs
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Automatic restructuring of GPU kernels for exploiting inter-thread data locality
CC'12 Proceedings of the 21st international conference on Compiler Construction
THeME: a system for testing by hardware monitoring events
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Vectorization technology to improve interpreter performance
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Refactoring and automated performance tuning of computational chemistry application codes
Proceedings of the Winter Simulation Conference
ACIC: automatic cloud I/O configurator for HPC applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
The purpose of the PAPI project is to specify a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count “events”, which are occurrences of specific signals and states related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis and tuning. The PAPI project has proposed a standard set of hardware events and a standardcross-platform library interface to the underlying counter hardware. The PAPI library has been or is in the process of being implemented on all major HPC platforms. The PAPI project is developing end-user tools for dynamically selecting and displaying hardware counter performance data. PAPI support is also being incorporated into a number of third-party tools.