Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An analytical model for multi-tier internet services and its applications
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative bug isolation
Stardust: tracking activity in a distributed storage system
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Pip: detecting the unexpected in distributed systems
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Autonomous resource provisioning for multi-service web applications
Proceedings of the 19th international conference on World wide web
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Understanding latency variations of black box services
Proceedings of the 22nd international conference on World Wide Web
An online service-oriented performance profiling tool for cloud computing systems
Frontiers of Computer Science: Selected Publications from Chinese Universities
Hi-index | 0.00 |
Services running in a data center frequently rely on RPCs to child services (e.g. storage, cache, authentication), and their latency depends crucially on latencies of those RPCs. However, even though service latency often comes exclusively from the time spent inside remote calls, it is difficult to determine parent latency since multithreading and asynchronous RPCs lead to complex and non-linear dependencies between service and RPC latencies. In this paper, we present a model that can be used to estimate parent latency given RPC latencies, where the parallel dependencies among of child services are modeled by an "execution flow", a direct acyclic graph. The model is learned from samples collected by a distributed tracing tool. Experiments demonstrate that these models are better able to predict top-level parent latency from child latency than state-of-the-art baselines such as linear regression and critical path analysis.