Precise, Scalable, and Online Request Tracing for Multitier Services of Black Boxes

  • Authors:
  • Bo Sang;Jianfeng Zhan;Gang Lu;Haining Wang;Dongyan Xu;Lei Wang;Zhihong Zhang;Zhen Jia

  • Affiliations:
  • Purdue University, West Lafayette;Chinese Academy of Sciences, Beijing;Chinese Academy of Sciences, Beijing;College of William and Mary, Williamsburg;Purdue University, West Lafayette;Chinese Academy of Sciences, Beijing;Chinese Academy of Sciences, Beijing;Chinese Academy of Sciences, Beijing

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As more and more multitier services are developed from commercial off-the-shelf components or heterogeneous middleware without source code available, both developers and administrators need a request tracing tool to 1) exactly know how a user request of interest travels through services of black boxes and 2) obtain macrolevel user request behaviors of services without manually analyzing massive logs. This need is further exacerbated by IT system “agility,” which mandates the tracing tool to provide online performance data since offline approaches cannot reflect system changes in real time. Moreover, considering the large scale of deployed services, a pragmatic tracing approach should be scalable in terms of the cost in collecting and analyzing logs. In this paper, we introduce a precise, scalable, and online request tracing tool for multitier services of black boxes. Our contributions are threefold. First, we propose a precise request tracing algorithm for multitier services of black boxes, which only uses application-independent knowledge. Second, we present a microlevel abstraction, component activity graph, to represent causal paths of each request. On the basis of this abstraction, we use dominated causal path patterns to represent repeatedly executed causal paths that account for significant fractions, and we further present a derived performance metric of causal path patterns, latency percentages of components, to enable debugging performance-in-the-large. Third, we develop two mechanisms, tracing on demand and sampling, to significantly increase the system scalability. We implement a prototype of the proposed system, called PreciseTracer, and release it as open source code. In comparison with WAP5—a black-box tracing approach, PreciseTracer achieves higher tracing accuracy and faster response time. Our experimental results also show that PreciseTracer has low overhead, and still achieves high tracing accuracy even if an aggressive sampling policy is adopted, indicating that PreciseTracer is a promising tracing tool for large-scale production systems.