Incremental clustering for dynamic information processing
ACM Transactions on Information Systems (TOIS)
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Machine Learning - Special issue on learning with probabilistic representations
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
ACM Transactions on Computer Systems (TOCS)
Supermon: A High-Speed Cluster Monitoring System
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Network management with Nagios
Linux Journal
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Data streaming algorithms for efficient and accurate estimation of flow size distribution
Proceedings of the joint international conference on Measurement and modeling of computer systems
A scalable distributed information management system
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Extensible, Scalable Monitoring for Clusters of Computers
LISA '97 Proceedings of the 11th USENIX conference on System administration
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
E2EProf: Automated End-to-End Performance Management for Enterprise Systems
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Delay aware querying with Seaweed
The VLDB Journal — The International Journal on Very Large Data Bases
San Fermín: aggregating large data sets using a binomial swap forest
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
A scalable, commodity data center network architecture
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Moara: flexible and scalable group-based querying system
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Performance comparison of middleware architectures for generating dynamic web content
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Monalytics: online monitoring and analytics for managing large scale data centers
Proceedings of the 7th international conference on Autonomic computing
The impact of management operations on the virtualized datacenter
Proceedings of the 37th annual international symposium on Computer architecture
CoolIT: coordinating facility and it management for efficient datacenters
HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Look who's talking: discovering dependencies between virtual machines using CPU utilization
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
High end scientific codes with computational I/O pipelines: improving their end-to-end performance
Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
In-situ I/O processing: a case for location flexibility
Proceedings of the sixth workshop on Parallel Data Storage
Net-cohort: detecting and managing VM ensembles in virtualized data centers
Proceedings of the 9th international conference on Autonomic computing
Project Hoover: auto-scaling streaming map-reduce applications
Proceedings of the 2012 workshop on Management of big data systems
Faster, larger, easier: reining real-time big data processing in cloud
Proceedings of the Posters and Demo Track
Towards an agent-based symbiotic architecture for autonomic management of virtualized data centers
Proceedings of the Winter Simulation Conference
VScope: middleware for troubleshooting time-sensitive data center applications
Proceedings of the 13th International Middleware Conference
Root cause detection in a service-oriented architecture
Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Survey Cloud monitoring: A survey
Computer Networks: The International Journal of Computer and Telecommunications Networking
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Hi-index | 0.00 |
To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous, real-time monitoring along with on-line analysis of the data captured by the monitoring system, i.e., integrated monitoring and analytics -- Monalytics [28]. A key challenge with such integration is to balance the costs incurred and associated delays, against the benefits attained from identifying and reacting to, in a timely fashion, undesirable or non-performing system states. This paper presents a novel, flexible architecture for Monalytics in which such trade-offs are easily made by dynamically constructing software overlays called Distributed Computation Graphs (DCGs) to implement desired analytics functions. The prototype of Monalytics implementing this flexible architecture is evaluated with motivating use cases in small scale data center experiments, and a series of analytical models is used to understand the above trade-offs at large scales. Results show that the approach provides the flexibility needed to meet the demands of autonomic management at large scale with considerably better performance/cost than traditional and brute force solutions.