Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
ICS '01 Proceedings of the 15th international conference on Supercomputing
Performance monitoring of java applications
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
A Comparison of Counting and Sampling Modes of Using Performance Monitoring Hardware
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
On-Line Debugging and Performance Monitoring with Barriers
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Language for the Complexity Analysis of Parallel Programs
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
The Hardware Performance Monitor Toolkit
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Review of Performance Analysis Tools for MPI Parallel Programs
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Distributed dynamic hash tables using IBM LAPI
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Effect of node size on the performance of cache-conscious B+-trees
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using reconfigurability to achieve real-time profiling for hardware/software codesign
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
The design of a performance steering system for component-based grid applications
Performance analysis and grid computing
Advances in the TAU performance system
Performance analysis and grid computing
Cache Simulation Based on Runtime Instrumentation for OpenMP Applications
ANSS '04 Proceedings of the 37th annual symposium on Simulation
Predicting the performance of parallel programs
Parallel Computing
Supporting on-line distributed monitoring and debugging
On-line monitoring systems and computer tool interoperability
Method-level phase behavior in java workloads
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Large-eddy simulations on distributed shared memory clusters
Journal of Parallel and Distributed Computing
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data
Proceedings of the 2005 conference on Diversity in computing
GcpSensor: a CPU Performance Tool for Grid Environments
QSIC '05 Proceedings of the Fifth International Conference on Quality Software
Performance characterization of molecular dynamics techniques for biomolecular simulations
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
On-line automated performance diagnosis on thousands of processes
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Performance feature identification by comparative trace analysis
Future Generation Computer Systems
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Extracting and improving microarchitecture performance on reconfigurable architectures
International Journal of Parallel Programming - Special issue: The next generation software program
Scientific Programming
A tool for performance modeling of parallel programs
Scientific Programming
Goldilocks: a race and transaction-aware java runtime
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Performance Measurement of Novice HPC Programmers Code
SE-HPC '07 Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing Applications
Managing The Complexity Of Performance Monitoring Hardware: The Brink Andabyss Approach
International Journal of High Performance Computing Applications
Compensation of Measurement Overhead in Parallel Performance Profiling
International Journal of High Performance Computing Applications
An operation stacking framework for large ensemble computations
Proceedings of the 21st annual international conference on Supercomputing
Scaling Properties of Common Statistical Operators for Gridded Datasets
International Journal of High Performance Computing Applications
Data morphing: an adaptive, cache-conscious storage technique
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hierarchical bin buffering: Online local moments for dynamic external memory arrays
ACM Transactions on Algorithms (TALG)
Proceedings of the 5th conference on Computing frontiers
Causal analysis for performance modeling of computer programs
Scientific Programming
Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers
ACM Transactions on Mathematical Software (TOMS)
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Slogger: A profiling and analysis system based on Semantic Web technologies
Scientific Programming - Large-Scale Programming Tools and Environments
Scientific Programming - Large-Scale Programming Tools and Environments
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Memory Allocation Tracing with VampirTrace
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
BTL++: From Performance Assessment to Optimal Libraries
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Detection and Analysis of Iterative Behavior in Parallel Applications
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
GCH: Hints for Triggering Garbage Collections
Transactions on High-Performance Embedded Architectures and Compilers I
Parallel Simulations of Dynamic Fracture Using Extrinsic Cohesive Elements
Journal of Scientific Computing
Enabling Data Structure Oriented Performance Analysis with Hardware Performance Counter Support
Euro-Par 2008 Workshops - Parallel Processing
Towards a hardware fault-injection testbed to support reproducible resiliency experiments
Proceedings of the 2009 workshop on Resiliency in high performance
A case for compiler-driven superpage allocation
Proceedings of the 47th Annual Southeast Regional Conference
Performance Profiling for OpenMP Tasks
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Recording the control flow of parallel applications to determine iterative and phase-based behavior
Future Generation Computer Systems
NIC-Assisted Cache-Efficient Receive Stack for Message Passing over Ethernet
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
On the Need for a Consortium of Capability Centers
International Journal of High Performance Computing Applications
Trees or grids?: indexing moving objects in main memory
Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An automated component-based performance experiment environment
Proceedings of the 2009 Workshop on Component-Based High Performance Computing
Capturing and analyzing the execution control flow of OpenMP applications
International Journal of Parallel Programming
Performance feature identification by comparative trace analysis
Future Generation Computer Systems
A cross-layer approach to heterogeneity and reliability
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Performance instrumentation and measurement for terascale systems
ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting numerical software and automatic tuning of heuristics
ICCS'03 Proceedings of the 2003 international conference on Computational science
Self-adapting numerical software and automatic tuning of heuristics
ICCS'03 Proceedings of the 2003 international conference on Computational science
OpenMP application tuning using hardware performance counters
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Identification of performance characteristics from multi-view trace analysis
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Operation Stacking for Ensemble Computations With Variable Convergence
International Journal of High Performance Computing Applications
Workload characterization using the TAU performance system
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Optimization of instrumentation in parallel performance evaluation tools
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
GASP! a standardized performance analysis tool interface for global address space programming models
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A PAPI implementation for BlueGene
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Visualizing the program execution control flow of OpenMP applications
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Speeding up Nek5000 with autotuning and specialization
Proceedings of the 24th ACM International Conference on Supercomputing
Workload characterization for operator-based distributed stream processing applications
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: A Case Study
International Journal of High Performance Computing Applications
A Simulation Framework for Rapid Analysis of Reconfigurable Computing Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Performance analysis of large-scale OpenMP and hybrid MPI/OpenMP applications with VampirNG
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Supporting nested OpenMP parallelism in the TAU performance system
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Model oriented profiling of parallel programs
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
The monitoring request interface (MRI)
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A framework to develop symbolic performance models of parallel applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SelInv---An Algorithm for Selected Inversion of a Sparse Symmetric Matrix
ACM Transactions on Mathematical Software (TOMS)
Mesa: automatic generation of lookup table optimizations
Proceedings of the 4th International Workshop on Multicore Software Engineering
VM-based slack emulation of large-scale systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
Controlling cache utilization of HPC applications
Proceedings of the international conference on Supercomputing
Leveraging reconfigurability in the hardware/software codesign process
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Exploiting dense substructures for fast sparse matrix vector multiplication
International Journal of High Performance Computing Applications
A work stealing scheduler for parallel loops on shared cache multicores
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SIAM Journal on Scientific Computing
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Xen-OSCAR for cluster virtualization
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Overseer: low-level hardware monitoring and management for Java
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Performance profiling overhead compensation for MPI programs
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Automatic data locality optimization through self-optimization
IWSOS'06/EuroNGI'06 Proceedings of the First international conference, and Proceedings of the Third international conference on New Trends in Network Architectures and Services conference on Self-Organising Systems
Hierarchical model validation of symbolic performance models of scientific kernels
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Parallel simulation of multicomponent systems
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
A performance measurement infrastructure for co-array fortran
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Comprehensive cache inspection with hardware monitors
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Controlled experimentation with agents: models and implementations
ESAW'04 Proceedings of the 5th international conference on Engineering Societies in the Agents World
Metronome: operating system level performance management via self-adaptive computing
Proceedings of the 49th Annual Design Automation Conference
Virtual-machine-based emulation of future generation high-performance computing systems
International Journal of High Performance Computing Applications
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Quantifying the effectiveness of load balance algorithms
Proceedings of the 26th ACM international conference on Supercomputing
Extracting the optimal sampling frequency of applications using spectral analysis
Concurrency and Computation: Practice & Experience
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
SIAM Journal on Scientific Computing
BlackjackBench: portable hardware characterization
ACM SIGMETRICS Performance Evaluation Review
Cache-efficient parallel isosurface extraction for shared cache multicores
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
NUMA-aware graph mining techniques for performance and energy efficiency
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On using incremental profiling for the performance analysis of shared memory parallel applications
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Detecting application load imbalance on high end massively parallel systems
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Investigating the memory characteristics of a massively parallel time warp kernel
Proceedings of the Winter Simulation Conference
Journal of Computational Physics
A peta-scalable CPU-GPU algorithm for global atmospheric simulations
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Numprof: a performance analysis framework for numerical libraries
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Parallel HEVC Decoding on Multi- and Many-core Architectures
Journal of Signal Processing Systems
Determination of performance characteristics of scientific applications on IBM Blue Gene/Q
IBM Journal of Research and Development
MuMMI: multiple metrics modeling infrastructure for exploring performance and power modeling
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Journal of Parallel and Distributed Computing
On the usefulness of object tracking techniques in performance analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Exploring power behaviors and trade-offs of in-situ data analytics
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Framework for a productive performance optimization
Parallel Computing
Leakage energy estimates for HPC applications
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Tools for machine-learning-based empirical autotuning and specialization
International Journal of High Performance Computing Applications
Experiences Developing the OpenUH Compiler and Runtime Infrastructure
International Journal of Parallel Programming
Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications
International Journal of Parallel Programming
A data hiding scheme based upon DCT coefficient modification
Computer Standards & Interfaces
Hi-index | 0.01 |
The purpose of the PAPI project is to specify a standard application programming interface (API) for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count events, which are occurrences of specific signals and states related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis, including hand tuning, compiler optimization, debugging, benchmarking, monitoring, and performance modeling. In addition, it is hoped that this information will prove useful in the development of new compilation technology as well as in steering architectural development toward alleviating commonly occurring bottlenecks in high performance computing.