Monitoring distributed systems
ACM Transactions on Computer Systems (TOCS)
Debugging heterogeneous distributed systems using event-based models of behavior
ACM Transactions on Computer Systems (TOCS)
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The Vision of Autonomic Computing
Computer
An Architecture for High Performance Network Analysis
ISCC '01 Proceedings of the Sixth IEEE Symposium on Computers and Communications
HiFi: A New Monitoring Architecture for Distributed Systems Management
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Request extraction in Magpie: events, schemas and temporal joins
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Using end-user latency to manage internet infrastructure
WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
Measuring and characterizing system behavior using kernel-level event logging
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Short term performance forecasting in enterprise systems
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
The taser intrusion recovery system
Proceedings of the twentieth ACM symposium on Operating systems principles
WAP5: black-box performance debugging for wide-area systems
Proceedings of the 15th international conference on World Wide Web
Challenges in managing dependable data systems
ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Stardust: tracking activity in a distributed storage system
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Towards self-predicting systems: What if you could ask ‘what-if’?
The Knowledge Engineering Review
Problem diagnosis in large-scale computing environments
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automated known problem diagnosis with event traces
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Using queries for distributed monitoring and forensics
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Automatic high-performance reconstruction and recovery
Computer Networks: The International Journal of Computer and Telecommunications Networking
Performance problem localization in self-healing, service-oriented systems using Bayesian networks
Proceedings of the 2007 ACM symposium on Applied computing
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Operating systems should support business change
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Performance modeling and system management for multi-component online services
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Detecting performance anomalies in global applications
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Towards fingerpointing in the Emulab dynamic distributed system
WORLDS'06 Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems - Volume 3
Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches
Whodunit: transactional profiling for multi-tier applications
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Exploiting nonstationarity for performance prediction
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Towards highly reliable enterprise network services via inference of multi-level dependencies
Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Categorizing and differencing system behaviours
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
eMIVA: tool support for the instrumentation of critical distributed applications
ACM SIGMETRICS Performance Evaluation Review
Hardware counter driven on-the-fly request signatures
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Understanding and visualizing full systems with data flow tomography
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
PDA: a tool for automated problem determination
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Hang analysis: fighting responsiveness bugs
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
BorderPatrol: isolating events for black-box tracing
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
30 seconds is not enough!: a study of operating system timer usage
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Processor hardware counter statistics as a first-class system resource
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Live monitoring: using adaptive instrumentation and analysis to debug and maintain web applications
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Why did my pc suddenly slow down?
SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Tracking in a spaghetti bowl: monitoring transactions using footprints
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DieCast: testing distributed systems with an accurate scale model
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
D3S: debugging deployed distributed systems
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Remote profiling of resource constraints of web servers using mini-flash crowds
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A dollar from 15 cents: cross-platform management for internet services
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
An Operating System Architecture for Future Information Appliances
SEUS '08 Proceedings of the 6th IFIP WG 10.2 international workshop on Software Technologies for Embedded and Ubiquitous Systems
Automatic request categorization in internet services
ACM SIGMETRICS Performance Evaluation Review
Evolution of storage management: transforming raw data into information
IBM Journal of Research and Development
Dynamic dependencies and performance improvement
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Diagnosing distributed systems with self-propelled instrumentation
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
Causeway: support for controlling and analyzing the execution of multi-tier applications
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
mBrace: action-based performance monitoring of multi-tier web applications
Proceedings of the Third Workshop on Dependable Distributed Data Management
Improving the responsiveness of internet services with automatic cache placement
Proceedings of the 4th ACM European conference on Computer systems
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Automated anomaly detection and performance modeling of enterprise applications
ACM Transactions on Computer Systems (TOCS)
How to keep your head above water while detecting errors
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Towards a middleware for configuring large-scale storage infrastructures
Proceedings of the 7th International Workshop on Middleware for Grids, Clouds and e-Science
Macroscope: end-point approach to networked application dependency discovery
Proceedings of the 5th international conference on Emerging networking experiments and technologies
EbAT: online methods for detecting utility cloud anomalies
Proceedings of the 6th Middleware Doctoral Symposium
Estimating service resource consumption from response time measurements
Proceedings of the Fourth International ICST Conference on Performance Evaluation Methodologies and Tools
Ganesha: blackBox diagnosis of MapReduce systems
ACM SIGMETRICS Performance Evaluation Review
Do you know your IQ?: a research agenda for information quality in systems
ACM SIGMETRICS Performance Evaluation Review
A weighted spectrum metric for comparison of internet topologies
ACM SIGMETRICS Performance Evaluation Review
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
SherLog: error diagnosis by connecting clues from run-time logs
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
SelfTalk for Dena: query language and runtime support for evaluating system behavior
ACM SIGOPS Operating Systems Review
Fingerprinting the datacenter: automated classification of performance crises
Proceedings of the 5th European conference on Computer systems
Passive inspection of sensor networks
DCOSS'07 Proceedings of the 3rd IEEE international conference on Distributed computing in sensor systems
Analyzing blocking to debug performance problems on multi-core systems
ACM SIGOPS Operating Systems Review
A load balancing framework for clustered storage systems
HiPC'08 Proceedings of the 15th international conference on High performance computing
A query language for understanding component interactions in production systems
Proceedings of the 24th ACM International Conference on Supercomputing
A query language and runtime tool for evaluating behavior of multi-tier servers
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
How to keep your head above water while detecting errors
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Measurement and diagnosis of address misconfigured P2P traffic
INFOCOM'10 Proceedings of the 29th conference on Information communications
Adaptive system anomaly prediction for large-scale hosting infrastructures
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
AjaxScope: A Platform for Remotely Monitoring the Client-Side Behavior of Web 2.0 Applications
ACM Transactions on the Web (TWEB)
Black-box problem diagnosis in parallel file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Discovery of application workloads from network file traces
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Mochi: visual log-analysis based tools for debugging hadoop
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Towards automatic inference of task hierarchies in complex systems
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Automating network application dependency discovery: experiences, limitations, and new solutions
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Quanto: tracking energy in networked embedded systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
SALSA: analyzing logs as state machines
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
A predictive and probabilistic load-balancing algorithm for cluster-based web servers
Applied Soft Computing
Look who's talking: discovering dependencies between virtual machines using CPU utilization
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Experiences with tracing causality in networked services
INM/WREN'10 Proceedings of the 2010 internet network management conference on Research on enterprise networking
Diagnosing mobile applications in the wild
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Visualizing windows system traces
Proceedings of the 5th international symposium on Software visualization
Recognizing patterns in streams with imprecise timestamps
Proceedings of the VLDB Endowment
Automating configuration troubleshooting with dynamic information flow analysis
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Improving software diagnosability via log enhancement
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
MT-WAVE: profiling multi-tier web applications
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
DieCast: Testing Distributed Systems with an Accurate Scale Model
ACM Transactions on Computer Systems (TOCS)
Fine-grained power modeling for smartphones using system call tracing
Proceedings of the sixth conference on Computer systems
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
WiDS checker: combating bugs in distributed systems
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
X-trace: a pervasive network tracing framework
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Friday: global comprehension for distributed replay
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
ASDF: an automated, online framework for diagnosing performance problems
Architecting dependable systems VII
Making programs forget: enforcing lifetime for sensitive data
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Rake: semantics assisted network-based tracing framework
Proceedings of the Nineteenth International Workshop on Quality of Service
A flexible architecture integrating monitoring and analytics for managing large-scale data centers
Proceedings of the 8th ACM international conference on Autonomic computing
G2: a graph processing system for diagnosing distributed systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
dFault: fault localization in large-scale peer-to-peer systems
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
PAL: Propagation-aware Anomaly Localization for cloud hosted distributed applications
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Practical experiences with chronics discovery in large telecommunications systems
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
A self-adaptive monitoring framework for component-based software systems
ECSA'11 Proceedings of the 5th European conference on Software architecture
Using link gradients to predict the impact of network latency on multitier applications
IEEE/ACM Transactions on Networking (TON)
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Fay: extensible distributed tracing from kernels to clusters
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Bootstrapping energy debugging on smartphones: a first look at energy bugs in mobile devices
Proceedings of the 10th ACM Workshop on Hot Topics in Networks
Understanding and improving the diagnostic workflow of MapReduce users
CHIMIT '11 Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology
Practical experiences with chronics discovery in large telecommunications systems
ACM SIGOPS Operating Systems Review
Improving Software Diagnosability via Log Enhancement
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
Dataflow Tomography: Information Flow Tracking For Understanding and Visualizing Full Systems
ACM Transactions on Architecture and Code Optimization (TACO)
DejaVu: accelerating resource allocation in virtualized environments
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Issues in automatic provenance collection
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Monere: monitoring of service compositions for failure diagnosis
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Causeway: support for controlling and analyzing the execution of multi-tier applications
Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware
Modellus: Automated modeling of complex internet data center applications
ACM Transactions on the Web (TWEB)
A case for coordinated resource management in heterogeneous multicore platforms
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Understanding performance modeling for modular mobile-cloud applications
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
scc: cluster storage provisioning informed by application characteristics and SLAs
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Application dependency discovery using matrix factorization
Proceedings of the 2012 IEEE 20th International Workshop on Quality of Service
What is my program doing? program dynamics in programmer's terms
RV'11 Proceedings of the Second international conference on Runtime verification
Automated diagnosis without predictability is a recipe for failure
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
PowerTracer: tracing requests in multi-tier services to diagnose energy inefficiency
Proceedings of the 9th international conference on Autonomic computing
3-Dimensional root cause diagnosis via co-analysis
Proceedings of the 9th international conference on Autonomic computing
Proceedings of the 9th international conference on Autonomic computing
Fay: Extensible Distributed Tracing from Kernels to Clusters
ACM Transactions on Computer Systems (TOCS)
Collaborative energy debugging for mobile devices
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
AppInsight: mobile app performance monitoring in the wild
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
On the accurate identification of network service dependencies in distributed systems
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Power containers: an OS facility for fine-grained power and energy management on multicore servers
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Performance problem diagnostics by systematic experimentation
Proceedings of the 18th international doctoral symposium on Components and architecture
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Adaptive monitoring of web-based applications: a performance study
Proceedings of the 28th Annual ACM Symposium on Applied Computing
An online service-oriented performance profiling tool for cloud computing systems
Frontiers of Computer Science: Selected Publications from Chinese Universities
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Carat: collaborative energy diagnosis for mobile devices
Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems
Timecard: controlling user-perceived delays in server-based mobile applications
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
On fault resilience of OpenStack
Proceedings of the 4th annual Symposium on Cloud Computing
Recognizing patterns in streams with imprecise timestamps
Information Systems
Comprehending performance from real-world execution traces: a device-driver case
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Making problem diagnosiswork for large-scale, production storage systems
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Panappticon: event-based tracing to measure mobile application and platform performance
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
Adtributor: revenue debugging in advertising systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
NetCheck: network diagnoses from blackbox traces
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Tools to understand complex system behaviour are essential for many performance analysis and debugging tasks, yet there are many open research problems in their development. Magpie is a toolchain for automatically extracting a system's workload under realistic operating conditions. Using low-overhead instrumentation, we monitor the system to record fine-grained events generated by kernel, middleware and application components. The Magpie request extraction tool uses an application-specific event schema to correlate these events, and hence precisely capture the control flow and resource consumption of each and every request. By removing scheduling artefacts, whilst preserving causal dependencies, we obtain canonical request descriptions from which we can construct concise workload models suitable for performance prediction and change detection. In this paper we describe and evaluate the capability of Magpie to accurately extract requests and construct representative models of system behaviour.