SysProf: Online Distributed Behavior Diagnosis through Fine-grain System Monitoring

Authors:
Sandip Agarwala;Karsten Schwan
Affiliations:
Georgia Institute of Technology;Georgia Institute of Technology
Venue:
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Year:
2006

Citing 0
Cited 9

Diagnosing distributed systems with self-propelled instrumentation

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
iManage: policy-driven self-management for enterprise-scale systems

Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
IBMon: monitoring VMM-bypass capable InfiniBand devices using memory introspection

Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing
EbAT: online methods for detecting utility cloud anomalies

Proceedings of the 6th Middleware Doctoral Symposium
iManage: policy-driven self-management for enterprise-scale systems

MIDDLEWARE2007 Proceedings of the 8th ACM/IFIP/USENIX international conference on Middleware
FaReS: Fair Resource Scheduling for VMM-Bypass InfiniBand Devices

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Using computational intelligence to identify performance bottlenecks in a computer system

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Monere: monitoring of service compositions for failure diagnosis

ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Runtime monitoring is key to the effective management of enterprise and high performance applications. To deal with the complex behaviors of today's multi-tier applications running across shared platforms, such monitoring must meet three criteria: (1) fine granularity, including being able to track the resource usage of specific application behaviors like individual client-server interactions, (2) real-time response, referring to the monitoring system's ability to both capture and analyze currently needed monitoring information with the delays required for online management, and (3) enterprise-wide operation, which means that the monitoring information captured and analyzed must span across the entire software stack and set of machines involved in request generation, request forwarding, service provision, and return. This paper presents the SysProf system-level monitoring toolkit, which provides a flexible, low overhead framework for enterprise-wide monitoring. The toolkit permits the capture of monitoring information at different levels of granularity, ranging from tracking the system-level activities triggered by a single system call, to capturing the client-server interactions associated with certain request classes, to characterizing the server resources consumed by sets of clients or client behaviors. The paper demonstrates the efficacy of SysProf by using it to manage two different enterprise applications: (1) detecting performance bottlenecks in a high performance shared network file service, and (2) enforcing service level agreements in a multi-tier auctioning web site.