An online service-oriented performance profiling tool for cloud computing systems

Authors:
Haibo Mi;Huaimin Wang;Yangfan Zhou;Michael Rung-Tsong Lyu;Hua Cai;Gang Yin
Affiliations:
National Lab for Parallel & Distributed Processing, National University of Defense Technology, Changsha, China 410073;National Lab for Parallel & Distributed Processing, National University of Defense Technology, Changsha, China 410073;Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China 518000;Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China 518000;Computing Platform, Alibaba Cloud Computing Company, Hangzhou, China 310000;National Lab for Parallel & Distributed Processing, National University of Defense Technology, Changsha, China 410073
Venue:
Frontiers of Computer Science: Selected Publications from Chinese Universities
Year:
2013

Citing 25
Cited 0

Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
gprof: a call graph execution profiler

ACM SIGPLAN Notices - Best of PLDI 1979-1999
WAP5: black-box performance debugging for wide-area systems

Proceedings of the 15th international conference on World Wide Web
Stardust: tracking activity in a distributed storage system

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Dynamic instrumentation of production systems

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Path-based faliure and evolution management

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pip: detecting the unexpected in distributed systems

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
BorderPatrol: isolating events for black-box tracing

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Ironmodel: robust performance models in the wild
DARC: dynamic analysis of root causes of latency distributions

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fingerprinting the datacenter: automated classification of performance crises

Proceedings of the 5th European conference on Computer systems
Quality of service profiling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers

IEEE Micro
vPath: precise discovery of request processing paths from black-box observations of thread and network activities

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Performance measurement and analysis tools for extremely scalable systems

Concurrency and Computation: Practice & Experience - International Supercomputing Conference
Pinpointing the Subsystems Responsible for the Performance Deviations in a Load Test

ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Diagnosing performance changes by comparing request flows

Proceedings of the 8th USENIX conference on Networked systems design and implementation
An Adaptive Performance Modeling Approach to Performance Profiling of Multi-service Web Applications

COMPSAC '11 Proceedings of the 2011 IEEE 35th Annual Computer Software and Applications Conference
Modeling the parallel execution of black-box services

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes

IEEE Transactions on Parallel and Distributed Systems
P-Tracer: Path-Based Performance Profiling in Cloud Computing Systems

COMPSAC '12 Proceedings of the 2012 IEEE 36th Annual Computer Software and Applications Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing scale and complexity of component interactions in cloud computing systems post great challenges for operators to understand the characteristics of system performance. Profiling has long been proved to be an effective approach to performance analysis; however, existing approaches confront new challenges that emerge in cloud computing systems. First, the efficiency of the profiling becomes of critical concern; second, service-oriented profiling should be considered to support separation-of-concerns performance analysis. To address the above issues, in this paper, we present P-Tracer, an online performance profiling tool specifically tailored for cloud computing systems. P-Tracer constructs a specific search engine that proactively processes performance logs and generates a particular index for fast queries; second, for each service, P-Tracer retrieves a statistical insight of performance characteristics from multi-dimensions and provides operators with a suite of web-based interfaces to query the critical information. We evaluate P-Tracer in the aspects of tracing overheads, data preprocessing scalability and querying efficiency. Three real-world case studies that happened in Alibaba cloud computing platform demonstrate that P-Tracer can help operators understand software behaviors and localize the primary causes of performance anomalies effectively and efficiently.