Full-System Critical Path Analysis

  • Authors:
  • Ali G. Saidi;Nathan L. Binkert;Steven K. Reinhardt;Trevor Mudge

  • Affiliations:
  • The University of Michigan, Department of EECS, saidi@eecs.umich.edu;Hewlett-Packard Labs, Palo Alto, California, binkert@hp.com;The University of Michigan, Department of EECS/ Reservoir Labs, Portland, Oregon, stever@reservoir.com;The University of Michigan, Department of EECS, tnm@eecs.umich.edu

  • Venue:
  • ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many interesting workloads today are limited not by CPU processing power but by the interactions between the CPU, memory system, I/O devices, and the complex software that ties all the components together. Optimizing these workloads requires identifying performance bottlenecks across concurrent hardware components and across multiple layers of software. Common software profiling techniques cannot account for hardware bottlenecks or situations where software overheads are hidden due to overlap with hardware operations. Critical-path analysis is a powerful approach for identifying bottlenecks in highly concurrent systems, but typically requires detailed domain knowledge to construct the required event dependence graphs. As a result, to date it has been applied only to isolated system layers (e.g., processor microarchitectures or message-passing applications). In this paper we present a novel technique for applying critical-path analysis to complex systems composed of numerous interacting state machines. We avoid tedious up-front modeling by using control-flow tracing to expose implicit software state machines automatically, and iterative refinement to add necessary manual annotations with minimal effort. By applying our technique within a full-system simulator, we achieve an integrated trace of hardware and software events with minimal perturbation. As a result, we can perform this analysis across the user/kernel and hardware/software boundaries and even across multiple systems. We apply this technique to analyzing network performance, and show that we are able to find performance bottlenecks in both hardware and software, including some surprising bottlenecks in the Linux 2.6.13 kernel.