Two fast implementations of the “minimal standard” random number generator
Communications of the ACM
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The program structure tree: computing control regions in linear time
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Program profiling problems, and a solution via machine language rewriting
ACM SIGPLAN Notices
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Alpha AXP architecture reference manual (2nd ed.)
Alpha AXP architecture reference manual (2nd ed.)
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Spike: an optimizer for alpha/NT executables
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Performance limitations of the Java core libraries
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback
ACM Transactions on Computer Systems (TOCS)
The impact of battery capacity and memory bandwidth on CPU speed-setting: a case study
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
A portable sampling-based profiler for Java virtual machines
Proceedings of the ACM 2000 conference on Java Grande
Efficient and flexible value sampling
ACM SIGPLAN Notices
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic statistical profiling of communication activity in distributed applications
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Process cruise control: event-driven clock scaling for dynamic power management
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Load Scheduling with Profile Information
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
SIP: Performance Tuning through Source Code Interdependence
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Deep Start: A Hybrid Strategy for Automated Performance Problem Searches
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Simple and General Statistical Profiling with PCT
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Scalable analysis techniques for microprocessor performance counter metrics
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Asserting performance expectations
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Bug isolation via remote program sampling
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Software Black Box: An Alternative Mechanism for Failure Analysis
ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Validated observation and reporting of microscopic performance using Pentium II counter facilities
WOSP '04 Proceedings of the 4th international workshop on Software and performance
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Proceedings of the international symposium on Code generation and optimization
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Module-aware translation for real-life desktop applications
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Profiling soft-core processor applications for hardware/software partitioning
Journal of Systems Architecture: the EUROMICRO Journal
Portable, efficient, and accurate sampling profiling for java-based middleware
SEM '05 Proceedings of the 5th international workshop on Software engineering and middleware
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
Portable and accurate sampling profiling for Java
Software—Practice & Experience - Research Articles
Relative factors in performance analysis of Java virtual machines
Proceedings of the 2nd international conference on Virtual execution environments
Energy-efficient CPU scheduling for multimedia applications
ACM Transactions on Computer Systems (TOCS)
A performance counter architecture for computing accurate CPI components
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Identifying potential parallelism via loop-centric profiling
Proceedings of the 4th international conference on Computing frontiers
Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks
IEEE Transactions on Software Engineering
Using hardware performance monitors to understand the behavior of java applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
L4oprof: a performance-monitoring-unit-based software-profiling framework for the L4 microkernel
ACM SIGOPS Operating Systems Review
Using hpm-sampling to drive dynamic compilation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Hardware counter driven on-the-fly request signatures
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Processor hardware counter statistics as a first-class system resource
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Controlled dynamic performance analysis
WOSP '08 Proceedings of the 7th international workshop on Software and performance
L4oprof: A System-Wide Profiler Using Hardware PMU in L4 Environment
ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
Performance profiling with EndoScope, an acquisitional software monitoring framework
Proceedings of the VLDB Endowment
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
TotalProf: a fast and accurate retargetable source code profiler
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Non-intrusive dynamic application profiling for multitasked applications
Proceedings of the 46th Annual Design Automation Conference
A systematic approach to profiling for hardware/software partitioning
Computers and Electrical Engineering
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Locating cache performance bottlenecks using data profiling
Proceedings of the 5th European conference on Computer systems
Taming hardware event samples for FDO compilation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Cooperative bug isolation: winning thesis of the 2005 ACM doctoral dissertation competition
Cooperative bug isolation: winning thesis of the 2005 ACM doctoral dissertation competition
Property-aware program sampling
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Quanto: tracking energy in networked embedded systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A domain specific language for execution profiling & regulation
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Combining hardware and software instrumentation to classify program executions
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Efficient hardware-based nonintrusive dynamic application profiling
ACM Transactions on Embedded Computing Systems (TECS)
Analyzing program flow within a many-kernel OpenCL application
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Lowering overhead in sampling-based execution monitoring and tracing
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Rapid identification of architectural bottlenecks via precise event counting
Proceedings of the 38th annual international symposium on Computer architecture
MT-Profiler: a parallel dynamic analysis framework based on two-stage sampling
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
A portable and customizable profiling framework for java based on bytecode instruction counting
APLAS'05 Proceedings of the Third Asian conference on Programming Languages and Systems
Collecting and exploiting cache-reuse metrics
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
An efficient CPI stack counter architecture for superscalar processors
Proceedings of the great lakes symposium on VLSI
Profiling all paths: A new profiling technique for both cyclic and acyclic paths
Journal of Systems and Software
How much does unused code matter for maintenance?
Proceedings of the 34th International Conference on Software Engineering
Visualizing transactional memory
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
ACM Transactions on Embedded Computing Systems (TECS)
Fmeter: extracting indexable low-level system signatures by counting kernel function calls
Proceedings of the 13th International Middleware Conference
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
A data-centric profiler for parallel programs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.