Monalytics: online monitoring and analytics for managing large scale data centers

Authors:
Mahendra Kutare;Greg Eisenhauer;Chengwei Wang;Karsten Schwan;Vanish Talwar;Matthew Wolf
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA;Hewlett Packard Labs, Palo Alto, CA, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 7th international conference on Autonomic computing
Year:
2010

Citing 20
Cited 8

DeBugging and Performance Tuning for Parallel Computing Systems

DeBugging and Performance Tuning for Parallel Computing Systems
Performance and scalability of EJB applications

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Runtime Monitoring Framework for the TAU Profiling System

ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining

ACM Transactions on Computer Systems (TOCS)
Event Services for High Performance Computing

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Active Streams-An Approach to Adaptive Distributed Systems

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
A scalable distributed information management system

Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Failure Diagnosis Using Decision Trees

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Adaptive Control of Extreme-scale Stream Processing Systems

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MON: on-demand overlays for distributed system management

WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
E2EProf: Automated End-to-End Performance Management for Enterprise Systems

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Exploiting nonstationarity for performance prediction

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Moara: flexible and scalable group-based querying system

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
DataStager: scalable data staging services for petascale applications

Proceedings of the 18th ACM international symposium on High performance distributed computing
Ranking the importance of alerts for problem determination in large computer systems

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
vManage: loosely coupled platform and virtualization management in data centers

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Automatically patching errors in deployed software

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles

Managing Variability in the IO Performance of Petascale Storage Systems

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Just in time: adding value to the IO pipelines of high performance applications with JITStaging

Proceedings of the 20th international symposium on High performance distributed computing
A flexible architecture integrating monitoring and analytics for managing large-scale data centers

Proceedings of the 8th ACM international conference on Autonomic computing
Usage patterns in multi-tenant data centers: a temporal perspective

Proceedings of the 9th international conference on Autonomic computing
Evaluating compressive sampling strategies for performance monitoring of data centers

Proceedings of the 9th international conference on Autonomic computing
Survey Cloud monitoring: A survey

Computer Networks: The International Journal of Computer and Telecommunications Networking
Specialized storage for big numeric time series

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Performance troubleshooting in data centers: an annotated bibliography?

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.01

Visualization

Abstract

To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous monitoring along with online analysis of the data captured by the monitoring system. As a result, there is a need to move to systems in which both tasks can be performed in an integrated fashion, thereby better able to drive online system management. Coining the term 'monalytics' to refer to the combined monitoring and analysis systems used for managing large-scale data center systems, this paper articulates principles for monalytics systems, describes software approaches for implementing them, and provides experimental evaluations justifying principles and implementation approach. Specific technical contributions include consideration of scalability across both 'space' and 'time', the ability to dynamically deploy and adjust monalytics functionality at multiple levels of abstraction in target systems, and the capability to operate across the range of application to hypervisor layers present in large-scale data center or cloud computing systems. Our monalytics implementation targets virtualized systems and cloud infrastructures, via the integration of its functionality into the Xen hypervisor.