End-to-end performance forecasting: finding bottlenecks before they happen

  • Authors:
  • Ali G. Saidi;Nathan L. Binkert;Steven K. Reinhardt;Trevor Mudge

  • Affiliations:
  • The University of Michigan, Ann Arbor, MI, USA;Hewlett-Packard, Palo Alto, CA, USA;Advanced Micro Devices, Bellevue, WA, USA;The University of Michigan, Ann Arbor, MI, USA

  • Venue:
  • Proceedings of the 36th annual international symposium on Computer architecture
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many important workloads today, such as web-hosted services, are limited not by processor core performance but by interactions among the cores, the memory system, I/O devices, and the complex software layers that tie these components together. Architects designing future systems for these workloads are challenged to identify performance bottlenecks because, as in any concurrent system, overheads in one component may be hidden due to overlap with other operations. These overlaps span the user/kernel and software/hardware boundaries, making traditional performance analysis techniques inadequate. We present a methodology for identifying end-to-end critical paths across software and simulated hardware in complex networked systems. By modeling systems as collections of state machines interacting via queues, we can trace critical paths through multiplexed processing engines, identify when resources create bottlenecks (including abstract resources such as flow-control credits), and predict the benefit of eliminating bottlenecks by increasing hardware speeds or expanding available resources. We implement our technique in a full-system simulator and analyze a TCP microbenchmark, a web server, the Linux TCP/IP stack, and an Ethernet controller. From a single run of the microbenchmark, our tool--within minutes--correctly identifies a series of bottlenecks, and predicts the performance of hypothetical systems in which these bottlenecks are successively eliminated, culminating in a total speedup of 3X.We then validate these predictions through hours of additional simulation, and find them to be accurate within 1--17%. We also analyze the web server, find it to be CPU-bound, and predict the performance of a system with an additional core within 6%.