Modeling the parallel execution of black-box services

Authors:
Gideon Mann;Mark Sandler;Darja Krushevskaja;Sudipto Guha;Eyal Even-Dar
Affiliations:
Google Inc., New York, NY;Google Inc., New York, NY;Rutgers University, New Brunwick, NJ;University of Pennsylvania, Philadelphia, PA;Final Inc., Herzliya-Pituach, Israel
Venue:
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Year:
2011

Citing 11
Cited 2

Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An analytical model for multi-tier internet services and its applications

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative bug isolation

Cooperative bug isolation
Stardust: tracking activity in a distributed storage system

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Magpie: online modelling and performance-aware systems

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Autonomous resource provisioning for multi-service web applications

Proceedings of the 19th international conference on World wide web
Diagnosing performance changes by comparing request flows

Proceedings of the 8th USENIX conference on Networked systems design and implementation
X-trace: a pervasive network tracing framework

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation

Understanding latency variations of black box services

Proceedings of the 22nd international conference on World Wide Web
An online service-oriented performance profiling tool for cloud computing systems

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

Services running in a data center frequently rely on RPCs to child services (e.g. storage, cache, authentication), and their latency depends crucially on latencies of those RPCs. However, even though service latency often comes exclusively from the time spent inside remote calls, it is difficult to determine parent latency since multithreading and asynchronous RPCs lead to complex and non-linear dependencies between service and RPC latencies. In this paper, we present a model that can be used to estimate parent latency given RPC latencies, where the parallel dependencies among of child services are modeled by an "execution flow", a direct acyclic graph. The model is learned from samples collected by a distributed tracing tool. Experiments demonstrate that these models are better able to predict top-level parent latency from child latency than state-of-the-art baselines such as linear regression and critical path analysis.