Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Mache: no-loss trace compaction
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Trace-based mobile network emulation
SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
Capacity planning for Web performance: metrics, models, and methods
Capacity planning for Web performance: metrics, models, and methods
A trace-driven analysis of the UNIX 4.2 BSD file system
Proceedings of the tenth ACM symposium on Operating systems principles
Extensible, Scalable Monitoring for Clusters of Computers
LISA '97 Proceedings of the 11th Conference on Systems Administration
Automatic Generation of a Software Performance Model Using an Object-Oriented Prototype
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Efficient Byzantine-Tolerant Erasure-Coded Storage
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
High Resolution Forward And Inverse Earthquake Modeling on Terascale Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Passive NFS Tracing of Email and Research Workloads
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Hibernator: helping disk arrays sleep through the winter
Proceedings of the twentieth ACM symposium on Operating systems principles
Request extraction in Magpie: events, schemas and temporal joins
Proceedings of the 11th workshop on ACM SIGOPS European workshop
A read/write protocol family for versatile storage infrastructures
A read/write protocol family for versatile storage infrastructures
Dynamic instrumentation of production systems
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Ursa minor: versatile cluster-based storage
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Informed data distribution selection in a self-predicting storage system
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Exploiting nonstationarity for performance prediction
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Categorizing and differencing system behaviours
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Observer: keeping system models from becoming obsolete
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
BorderPatrol: isolating events for black-box tracing
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Frequent pattern mining for kernel trace data
Proceedings of the 2008 ACM symposium on Applied computing
Diagnosing distributed systems with self-propelled instrumentation
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
DIADS: addressing the "my-problem-or-yours" syndrome with integrated SAN and database diagnosis
FAST '09 Proccedings of the 7th conference on File and storage technologies
HYDRAstor: a Scalable Secondary Storage
FAST '09 Proccedings of the 7th conference on File and storage technologies
Evaluating similarity-based trace reduction techniques for scalable performance analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
DSF: a common platform for distributed systems research and development
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Do you know your IQ?: a research agenda for information quality in systems
ACM SIGMETRICS Performance Evaluation Review
A load balancing framework for clustered storage systems
HiPC'08 Proceedings of the 15th international conference on High performance computing
DSF: a common platform for distributed systems research and development
Middleware'09 Proceedings of the ACM/IFIP/USENIX 10th international conference on Middleware
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Network imprecision: a new consistency metric for scalable monitoring
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
MT-WAVE: profiling multi-tier web applications
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Diagnosing performance changes by comparing request flows
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Otus: resource attribution in data-intensive clusters
Proceedings of the second international workshop on MapReduce and its applications
Italian for beginners: the next steps for SLO-based management
HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
Modeling the parallel execution of black-box services
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories
ACM Transactions on Storage (TOS)
What is my program doing? program dynamics in programmer's terms
RV'11 Proceedings of the Second international conference on Runtime verification
Automated diagnosis without predictability is a recipe for failure
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Responding rapidly to service level violations using virtual appliances
ACM SIGOPS Operating Systems Review
Towards I/O analysis of HPC systems and a generic architecture to collect access patterns
Computer Science - Research and Development
An online service-oriented performance profiling tool for cloud computing systems
Frontiers of Computer Science: Selected Publications from Chinese Universities
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
IOFlow: a software-defined storage architecture
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Hi-index | 0.00 |
Performance monitoring in most distributed systems provides minimal guidance for tuning, problem diagnosis, and decision making. Stardust is a monitoring infrastructure that replaces traditional performance counters with end-to-end traces of requests and allows for efficient querying of performance metrics. Such traces better inform key administrative performance challenges by enabling, for example, extraction of per-workload, per-resource demand information and per-workload latency graphs. This paper reports on our experience building and using end-to-end tracing as an on-line monitoring tool in a distributed storage system. Using diverse system workloads and scenarios, we show that such fine-grained tracing can be made efficient (less than 6% overhead) and is useful for on- and off-line analysis of system behavior. These experiences make a case for having other systems incorporate such an instrumentation framework.