Parallel program performance metrics: a comprison and validation
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An online computation of critical path profiling
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Generating representative Web workloads for network and server performance evaluation
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Critical path analysis of TCP transactions
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Quantifying Instruction Criticality
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Quantifying instruction criticality for shared memory multiprocessors
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Vertical profiling: understanding the behavior of object-priented applications
OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Queue - Performance
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Understanding and visualizing full systems with data flow tomography
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Full-System Critical Path Analysis
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Processor Performance Modeling using Symbolic Simulation
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Criticality-driven superscalar design space exploration
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Many important workloads today, such as web-hosted services, are limited not by processor core performance but by interactions among the cores, the memory system, I/O devices, and the complex software layers that tie these components together. Architects designing future systems for these workloads are challenged to identify performance bottlenecks because, as in any concurrent system, overheads in one component may be hidden due to overlap with other operations. These overlaps span the user/kernel and software/hardware boundaries, making traditional performance analysis techniques inadequate. We present a methodology for identifying end-to-end critical paths across software and simulated hardware in complex networked systems. By modeling systems as collections of state machines interacting via queues, we can trace critical paths through multiplexed processing engines, identify when resources create bottlenecks (including abstract resources such as flow-control credits), and predict the benefit of eliminating bottlenecks by increasing hardware speeds or expanding available resources. We implement our technique in a full-system simulator and analyze a TCP microbenchmark, a web server, the Linux TCP/IP stack, and an Ethernet controller. From a single run of the microbenchmark, our tool--within minutes--correctly identifies a series of bottlenecks, and predicts the performance of hypothetical systems in which these bottlenecks are successively eliminated, culminating in a total speedup of 3X.We then validate these predictions through hours of additional simulation, and find them to be accurate within 1--17%. We also analyze the web server, find it to be CPU-bound, and predict the performance of a system with an additional core within 6%.