Elements of information theory
Elements of information theory
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Pentium 4 Performance-Monitoring Features
IEEE Micro
MPX: Software for Multiplexing Hardware Performance Counters in Multithreaded Programs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Modeling Superscalar Processors via Statistical Simulation
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Quantifying Instruction Criticality
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Scalable analysis techniques for microprocessor performance counter metrics
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Characterizing and Predicting Program Behavior and its Variability
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A First-Order Superscalar Processor Model
Proceedings of the 31st annual international symposium on Computer architecture
Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Using hardware performance monitors to understand the behavior of java applications
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Rapidly Selecting Good Compiler Optimizations using Performance Counters
Proceedings of the International Symposium on Code Generation and Optimization
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Hardware counter driven on-the-fly request signatures
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Evaluating Sampling Based Hotspot Detection
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Enhancing operating system support for multicore processors by using hardware performance monitoring
ACM SIGOPS Operating Systems Review
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 7th international conference on Autonomic computing
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
A System-Level Model for Runtime Power Estimation on Mobile Devices
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Loaf: a framework and infrastructure for creating online adaptive solutions
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Pruning hardware evaluation space via correlation-driven application similarity analysis
Proceedings of the 8th ACM International Conference on Computing Frontiers
A survey on energy-efficient data management
ACM SIGMOD Record
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
Proceedings of the 26th ACM international conference on Supercomputing
THeME: a system for testing by hardware monitoring events
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Experiences understanding performance in acommercial scale-out environment
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Framework for a productive performance optimization
Parallel Computing
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
A performance-aware quality of service-driven scheduler for multicore processors
ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)
Hi-index | 0.00 |
Hardware performance counters (HPCs) are increasingly being used to analyze performance and identify the causes of performance bottlenecks. However, HPCs are difficult to use for several reasons. Microprocessors do not provide enough counters to simultaneously monitor the many different types of events needed to form an over-all understanding of performance. Moreover, HPCs primarily count low-level micro-architectural events from which it is difficult to extract high-level insight required for identifying causes of performance problems.We describe two techniques that help overcome these difficulties, allowing HPCs to be used in dynamic real-time optimizers. First, statistical sampling is used to dynamically multiplex HPCs and make a larger set of logical HPCs available. Using real programs, we show experimentally that it is possible through this sampling to obtain counts of hardware events that are statistically similar (within 15%) to complete non-sampled counts, thus allowing us to provide a much larger set of logical HPCs. Second, we observe that stall cycles are a primary source of inefficiencies, and hence they should be major targets for software optimization. Based on this observation, we build a simple model in real-time that speculatively associates each stall cycle to a processor component that likely caused the stall. The information needed to produce this model is obtained using our HPC multiplexing facility to monitor a large number of hardware components simultaneously. Our analysis shows that even in an out-of-order superscalar micro-processor such a speculative approach yields a fairly accurate model with run-time overhead for collection and computation of under 2%.These results demonstrate that we can effective analyze on-line performance of application and system code running at full speed. The stall analysis shows where performance is being lost on a given processor.