DeBugging and Performance Tuning for Parallel Computing Systems
DeBugging and Performance Tuning for Parallel Computing Systems
Performance and scalability of EJB applications
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A Runtime Monitoring Framework for the TAU Profiling System
ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
ACM Transactions on Computer Systems (TOCS)
Event Services for High Performance Computing
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Active Streams-An Approach to Adaptive Distributed Systems
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Failure Diagnosis Using Decision Trees
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Adaptive Control of Extreme-scale Stream Processing Systems
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MON: on-demand overlays for distributed system management
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
E2EProf: Automated End-to-End Performance Management for Enterprise Systems
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Exploiting nonstationarity for performance prediction
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Moara: flexible and scalable group-based querying system
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
DataStager: scalable data staging services for petascale applications
Proceedings of the 18th ACM international symposium on High performance distributed computing
Ranking the importance of alerts for problem determination in large computer systems
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
vManage: loosely coupled platform and virtualization management in data centers
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Automatically patching errors in deployed software
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Managing Variability in the IO Performance of Petascale Storage Systems
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Just in time: adding value to the IO pipelines of high performance applications with JITStaging
Proceedings of the 20th international symposium on High performance distributed computing
A flexible architecture integrating monitoring and analytics for managing large-scale data centers
Proceedings of the 8th ACM international conference on Autonomic computing
Usage patterns in multi-tenant data centers: a temporal perspective
Proceedings of the 9th international conference on Autonomic computing
Evaluating compressive sampling strategies for performance monitoring of data centers
Proceedings of the 9th international conference on Autonomic computing
Survey Cloud monitoring: A survey
Computer Networks: The International Journal of Computer and Telecommunications Networking
Specialized storage for big numeric time series
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Hi-index | 0.01 |
To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous monitoring along with online analysis of the data captured by the monitoring system. As a result, there is a need to move to systems in which both tasks can be performed in an integrated fashion, thereby better able to drive online system management. Coining the term 'monalytics' to refer to the combined monitoring and analysis systems used for managing large-scale data center systems, this paper articulates principles for monalytics systems, describes software approaches for implementing them, and provides experimental evaluations justifying principles and implementation approach. Specific technical contributions include consideration of scalability across both 'space' and 'time', the ability to dynamically deploy and adjust monalytics functionality at multiple levels of abstraction in target systems, and the capability to operate across the range of application to hypervisor layers present in large-scale data center or cloud computing systems. Our monalytics implementation targets virtualized systems and cloud infrastructures, via the integration of its functionality into the Xen hypervisor.