A flexible architecture integrating monitoring and analytics for managing large-scale data centers

Authors:
Chengwei Wang;Karsten Schwan;Vanish Talwar;Greg Eisenhauer;Liting Hu;Matthew Wolf
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, GA, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA, USA;HP Labs, Palo Alto, CA, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA, USA;College of Computing, Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 8th ACM international conference on Autonomic computing
Year:
2011

Citing 26
Cited 10

Incremental clustering for dynamic information processing

ACM Transactions on Information Systems (TOIS)
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract)

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
Supermon: A High-Speed Cluster Monitoring System

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Network management with Nagios

Linux Journal
Performance debugging for distributed systems of black boxes

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Data streaming algorithms for efficient and accurate estimation of flow size distribution

Proceedings of the joint international conference on Measurement and modeling of computer systems
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Extensible, Scalable Monitoring for Clusters of Computers

LISA '97 Proceedings of the 11th USENIX conference on System administration
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Capturing, indexing, clustering, and retrieving system history

Proceedings of the twentieth ACM symposium on Operating systems principles
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
E2EProf: Automated End-to-End Performance Management for Enterprise Systems

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Towards highly reliable enterprise network services via inference of multi-level dependencies

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Delay aware querying with Seaweed

The VLDB Journal — The International Journal on Very Large Data Bases
San Fermín: aggregating large data sets using a binomial swap forest

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
A scalable, commodity data center network architecture

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Moara: flexible and scalable group-based querying system

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Performance comparison of middleware architectures for generating dynamic web content

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Monalytics: online monitoring and analytics for managing large scale data centers

Proceedings of the 7th international conference on Autonomic computing
The impact of management operations on the virtualized datacenter

Proceedings of the 37th annual international symposium on Computer architecture
CoolIT: coordinating facility and it management for efficient datacenters

HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Look who's talking: discovering dependencies between virtual machines using CPU utilization

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing

High end scientific codes with computational I/O pipelines: improving their end-to-end performance

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
In-situ I/O processing: a case for location flexibility

Proceedings of the sixth workshop on Parallel Data Storage
Net-cohort: detecting and managing VM ensembles in virtualized data centers

Proceedings of the 9th international conference on Autonomic computing
Project Hoover: auto-scaling streaming map-reduce applications

Proceedings of the 2012 workshop on Management of big data systems
Faster, larger, easier: reining real-time big data processing in cloud

Proceedings of the Posters and Demo Track
Towards an agent-based symbiotic architecture for autonomic management of virtualized data centers

Proceedings of the Winter Simulation Conference
VScope: middleware for troubleshooting time-sensitive data center applications

Proceedings of the 13th International Middleware Conference
Root cause detection in a service-oriented architecture

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Survey Cloud monitoring: A survey

Computer Networks: The International Journal of Computer and Telecommunications Networking
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous, real-time monitoring along with on-line analysis of the data captured by the monitoring system, i.e., integrated monitoring and analytics -- Monalytics [28]. A key challenge with such integration is to balance the costs incurred and associated delays, against the benefits attained from identifying and reacting to, in a timely fashion, undesirable or non-performing system states. This paper presents a novel, flexible architecture for Monalytics in which such trade-offs are easily made by dynamically constructing software overlays called Distributed Computation Graphs (DCGs) to implement desired analytics functions. The prototype of Monalytics implementing this flexible architecture is evaluated with motivating use cases in small scale data center experiments, and a series of analytical models is used to understand the above trade-offs at large scales. Results show that the approach provides the flexibility needed to meet the demands of autonomic management at large scale with considerably better performance/cost than traditional and brute force solutions.