Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases
IEEE Transactions on Visualization and Computer Graphics
A large-scale study of failures in high-performance computing systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Failure trends in a large disk drive population
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression
The Journal of Machine Learning Research
Bad Words: Finding Faults in Spirit's Syslogs
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Log summarization and anomaly detection for troubleshooting distributed systems
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Mining console logs for large-scale system problem detection
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Ganesha: blackBox diagnosis of MapReduce systems
ACM SIGMETRICS Performance Evaluation Review
Toward automatic policy refinement in repair services for large distributed systems
ACM SIGOPS Operating Systems Review
MR-scope: a real-time tracing tool for MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Chukwa: a system for reliable large-scale log collection
LISA'10 Proceedings of the 24th international conference on Large installation system administration
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Otus: resource attribution in data-intensive clusters
Proceedings of the second international workshop on MapReduce and its applications
Towards quantitative analysis of data intensive computing: a case study of Hadoop
Proceedings of the 8th ACM international conference on Autonomic computing
HiTune: dataflow-based performance analysis for big data cloud
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Understanding and improving the diagnostic workflow of MapReduce users
CHIMIT '11 Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology
HiTune: dataflow-based performance analysis for big data cloud
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Putting a "big-data" platform to good use: training kinect
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Theia: visual signatures for problem diagnosis in large hadoop clusters
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Structured and Interoperable Logging for the Cloud Computing Era: The Pitfalls and Benefits
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Artemis is a modular application designed for analyzing and troubleshooting the performance of large clusters running datacenter services. Artemis is composed of four modules: (1) distributed log collection and data extraction, (2) a database storing the extracted data, (3) an interactive visualization tool for exploring the data, and (4) a plug-in interface (and a set of sample plug-ins) allowing users to implement data analysis tools including (a) the extraction and construction of new features from the basic measurements collected, and (b) the implementation and invocation of statistical and machine learning algorithms and tools. In this paper we describe each of these components and then we illustrate the power of the plug-in architecture by presenting a case-study using Artemis to analyze a Dryad application running on a 240-machine cluster.