Self-correlating predictive information tracking for large-scale production systems

Authors:
Ying Zhao;Yongmin Tan;Zhenhuan Gong;Xiaohui Gu;Mike Wamboldt
Affiliations:
Tsinghua University, Beijing, China;North Carolina State University , Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA;IBM RTP, Durham, NC, USA
Venue:
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Year:
2009

Citing 21
Cited 4

Algorithms for clustering data

Algorithms for clustering data
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
A blueprint for introducing disruptive technology into the Internet

ACM SIGCOMM Computer Communication Review
Grid Information Services for Distributed Resource Sharing

HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Adaptive stream resource management using Kalman Filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Mercury: supporting scalable multi-attribute range queries

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Ensembles of Models for Automated Diagnosis of System Performance Problems

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
NodeWiz: peer-to-peer resource discovery for grids

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid - Volume 01
Autonomic Computing

Autonomic Computing
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Spatial correlation-based collaborative medium access control in wireless sensor networks

IEEE/ACM Transactions on Networking (TON)
Using queries for distributed monitoring and forensics

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Flight data recorder: monitoring persistent-state interactions to improve systems management

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Querying the internet with PIER

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Model-driven data acquisition in sensor networks

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Design and implementation tradeoffs for wide-area resource discovery

HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
STAR: self-tuning aggregation for scalable monitoring

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Resource Bundles: Using Aggregation for Statistical Wide-Area Resource Discovery and Allocation

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Toward Predictive Failure Management for Distributed Stream Processing Systems

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems

Context-aware reconfiguration of autonomic managers in real-time control applications

Proceedings of the 7th international conference on Autonomic computing
On the use of computational geometry to detect software faults at runtime

Proceedings of the 7th international conference on Autonomic computing
OLIC: online information compression for scalable hosting infrastructure monitoring

Proceedings of the Nineteenth International Workshop on Quality of Service
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic management of large-scale production systems requires a continuous monitoring service to keep track of the states of the managed system. However, it is challenging to achieve both scalability and high information precision while continuously monitoring a large amount of distributed and time-varying metrics in large-scale production systems. In this paper, we present a new self-correlating, predictive information tracking system called InfoTrack, which employs lightweight temporal and spatial correlation discovery methods to minimize continuous monitoring cost. InfoTrack combines both metric value prediction within individual nodes and adaptive clustering among distributed nodes to suppress remote information update in distributed system monitoring. We have implemented a prototype of the InfoTrack system and deployed the system on the PlanetLab. We evaluated the performance of the InfoTrack system using both real system traces and micro-benchmark prototype experiments. The experimental results show that InfoTrack can reduce the continuous monitoring cost by 50-90% while maintaining high information precision (i.e., within 0.01-0.05 error bound).