VScope: middleware for troubleshooting time-sensitive data center applications

Authors:
Chengwei Wang;Infantdani Abel Rayan;Greg Eisenhauer;Karsten Schwan;Vanish Talwar;Matthew Wolf;Chad Huneycutt
Affiliations:
Georgia Institute of Technology;Riot Games;Georgia Institute of Technology;Georgia Institute of Technology;HP Labs;Georgia Institute of Technology;Georgia Institute of Technology
Venue:
Proceedings of the 13th International Middleware Conference
Year:
2012

Citing 25
Cited 3

Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Open source clustering software

Bioinformatics
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
E2EProf: Automated End-to-End Performance Management for Enterprise Systems

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Moara: flexible and scalable group-based querying system

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
I-RMI: performance isolation in information flow applications

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
vManage: loosely coupled platform and virtualization management in data centers

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
EbAT: online methods for detecting utility cloud anomalies

Proceedings of the 6th Middleware Doctoral Symposium
Supporting soft real-time tasks in the xen hypervisor

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Fingerprinting the datacenter: automated classification of performance crises

Proceedings of the 5th European conference on Computer systems
The impact of management operations on the virtualized datacenter

Proceedings of the 37th annual international symposium on Computer architecture
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers

IEEE Micro
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Lightweight, high-resolution monitoring for troubleshooting production systems

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Chukwa: a system for reliable large-scale log collection

LISA'10 Proceedings of the 24th international conference on Large installation system administration
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
A flexible architecture integrating monitoring and analytics for managing large-scale data centers

Proceedings of the 8th ACM international conference on Autonomic computing
G2: a graph processing system for diagnosing distributed systems

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Fay: extensible distributed tracing from kernels to clusters

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Net-cohort: detecting and managing VM ensembles in virtualized data centers

Proceedings of the 9th international conference on Autonomic computing

Faster, larger, easier: reining real-time big data processing in cloud

Proceedings of the Posters and Demo Track
Root cause detection in a service-oriented architecture

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data-Intensive infrastructures are increasingly used for on-line processing of live data to guide operations and decision making. VScope is a flexible monitoring and analysis middleware for troubleshooting such large-scale, time-sensitive, multi-tier applications. With VScope, lightweight anomaly detection and interaction tracking methods can be run continuously throughout an application's execution. The runtime events generated by these methods can then initiate more detailed and heavier weight analyses which are dynamically deployed in the places where they may be most likely fruitful for root cause diagnosis and mitigation. We comprehensively evaluate VScope prototype in a virtualized data center environment with over 1000 virtual machines (VMs), and apply VScope to a representative on-line log processing application. Experimental results show that VScope can deploy and operate a variety of on-line analytics functions and metrics with a few seconds at large scale. Compared to traditional logging approaches, VScope based troubleshooting has substantially lower perturbation and generates much smaller log data volumes. It can also resolve complex cross-tier or cross-software-level issues unsolvable solely by application-level or per-tier mechanisms.