Stardust: tracking activity in a distributed storage system
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Detecting large-scale system problems by mining console logs
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Hunting for problems with Artemis
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
SALSA: analyzing logs as state machines
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
DataGarage: warehousing massive performance data on commodity servers
Proceedings of the VLDB Endowment
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
YCSB++: benchmarking and performance debugging advanced features in scalable table stores
Proceedings of the 2nd ACM Symposium on Cloud Computing
Theia: visual signatures for problem diagnosis in large hadoop clusters
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
Scalable Monitoring System for Clouds
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Frameworks for large scale data-intensive applications, such as Hadoop and Dryad, have gained tremendous popularity.Understanding the resource requirements of these frameworks and the performance characteristics of distributed applications is inherently difficult. We present an approach, based on resource attribution, that aims at facilitating performance analyses of distributed data-intensive applications.This approach is embodied in Otus, a monitoring tool to attribute resource usage to jobs and services in Hadoop clusters.Otus collects and correlates performance metrics from distributed components and provides views that display time-series of these metrics filtered and aggregated using multiple criteria.Our evaluation shows that this approach can be deployed without incurring major overheads.Our experience with Otus in a production cluster suggests its effectiveness at helping users and cluster administrators with application performance analysis and troubleshooting.