The keystroke-level model for user performance time with interactive systems
Communications of the ACM
Contextual Design: Defining Customer-Centered Systems
Contextual Design: Defining Customer-Centered Systems
Field studies of computer system administrators: analysis of system management tools and practices
CSCW '04 Proceedings of the 2004 ACM conference on Computer supported cooperative work
Beautiful Evidence
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
LiveRAC: interactive visual exploration of system management time-series data
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
SCUBA: focus and context for real-time mesh network health diagnosis
PAM'08 Proceedings of the 9th international conference on Passive and active network measurement
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Hunting for problems with Artemis
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Theia: visual signatures for problem diagnosis in large hadoop clusters
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Hi-index | 0.00 |
New abstractions are simplifying the programming of large clusters, but diagnosis nontheless gets more and more challenging as cluster sizes grow: Debugging information increases linearly with cluster size, and the count of intercomponent relationships grows quadratically. Worse, the new abstractions which simplified programming can also obscure the relationships between high-level (application) and low-level (task/process/disk/CPU) information flows. In this paper we analyze the workflow of several users and systems administrators connected with a large academic cluster (based the popular Hadoop implementation of the MapReduce abstraction) and propose improvements to the diagnosis-relevant information displays. We also offer a preliminary analysis of the efficacy of the changes we propose that demonstrates a 40% reduction in the time taken to accomplish 5 representative diagnostic tasks as compared to the current system.