End-to-end performance forecasting: finding bottlenecks before they happen

Authors:
Ali G. Saidi;Nathan L. Binkert;Steven K. Reinhardt;Trevor Mudge
Affiliations:
The University of Michigan, Ann Arbor, MI, USA;Hewlett-Packard, Palo Alto, CA, USA;Advanced Micro Devices, Bellevue, WA, USA;The University of Michigan, Ann Arbor, MI, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 17
Cited 3

Parallel program performance metrics: a comprison and validation

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An online computation of critical path profiling

SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Critical path analysis of TCP transactions

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slack: maximizing performance under technological constraints

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Quantifying Instruction Criticality

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Quantifying instruction criticality for shared memory multiprocessors

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Vertical profiling: understanding the behavior of object-priented applications

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Optimizing 10-Gigabit Ethernet for Networks of Workstations, Clusters, and Grids: A Case Study

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hidden in Plain Sight

Queue - Performance
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Understanding and visualizing full systems with data flow tomography

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Full-System Critical Path Analysis

ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
Processor Performance Modeling using Symbolic Simulation

ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software

Criticality-driven superscalar design space exploration

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many important workloads today, such as web-hosted services, are limited not by processor core performance but by interactions among the cores, the memory system, I/O devices, and the complex software layers that tie these components together. Architects designing future systems for these workloads are challenged to identify performance bottlenecks because, as in any concurrent system, overheads in one component may be hidden due to overlap with other operations. These overlaps span the user/kernel and software/hardware boundaries, making traditional performance analysis techniques inadequate. We present a methodology for identifying end-to-end critical paths across software and simulated hardware in complex networked systems. By modeling systems as collections of state machines interacting via queues, we can trace critical paths through multiplexed processing engines, identify when resources create bottlenecks (including abstract resources such as flow-control credits), and predict the benefit of eliminating bottlenecks by increasing hardware speeds or expanding available resources. We implement our technique in a full-system simulator and analyze a TCP microbenchmark, a web server, the Linux TCP/IP stack, and an Ethernet controller. From a single run of the microbenchmark, our tool--within minutes--correctly identifies a series of bottlenecks, and predicts the performance of hypothetical systems in which these bottlenecks are successively eliminated, culminating in a total speedup of 3X.We then validate these predictions through hours of additional simulation, and find them to be accurate within 1--17%. We also analyze the web server, find it to be CPU-bound, and predict the performance of a system with an additional core within 6%.