Sun performance and tuning: SPARC & Solaris
Sun performance and tuning: SPARC & Solaris
Machine Learning - Special issue on learning with probabilistic representations
Adaptive Probabilistic Networks with Hidden Variables
Machine Learning - Special issue on learning with probabilistic representations
httperf—a tool for measuring web server performance
ACM SIGMETRICS Performance Evaluation Review
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Minerva: An automated resource provisioning tool for large-scale storage systems
ACM Transactions on Computer Systems (TOCS)
Performance Guarantees for Web Server End-Systems: A Control-Theoretical Approach
IEEE Transactions on Parallel and Distributed Systems
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Web transaction analysis and optimization (TAO)
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
The Vision of Autonomic Computing
Computer
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
ACM Transactions on Computer Systems (TOCS)
A knowledge plane for the internet
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Grid Information Services for Distributed Resource Sharing
HPDC '01 Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
File Classification in Self-* Storage Systems
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Magpie: online modelling and performance-aware systems
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Model-based resource provisioning in a web service utility
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automating computer bottleneck detection with belief nets
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Sequential update of Bayesian network structure
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Short term performance forecasting in enterprise systems
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
Effective web service load balancing through statistical monitoring
Communications of the ACM - Self managed systems
Controllable fair queuing for meeting performance goals
Performance Evaluation - Performance 2005
A supervised learning approach for routing optimizations in wireless sensor networks
REALMAN '06 Proceedings of the 2nd international workshop on Multi-hop ad hoc networks: from theory to reality
Challenges in managing dependable data systems
ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Stardust: tracking activity in a distributed storage system
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Understanding the management of client perceived response time
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Proactive identification of performance problems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Mining web logs to debug distant connectivity problems
Proceedings of the 2006 SIGCOMM workshop on Mining network data
Concurrency control in computer services using adaptive optimal control
MIC'06 Proceedings of the 25th IASTED international conference on Modeling, indentification, and control
Problem diagnosis in large-scale computing environments
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Emergent (mis)behavior vs. complex software systems
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Automated known problem diagnosis with event traces
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Analytic modeling of multitier Internet applications
ACM Transactions on the Web (TWEB)
Performance problem localization in self-healing, service-oriented systems using Bayesian networks
Proceedings of the 2007 ACM symposium on Applied computing
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Zodiac: efficient impact analysis for storage area networks
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Falling off the cliff: when systems go nonlinear
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Detecting performance anomalies in global applications
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
JIT instrumentation: a novel approach to dynamically instrument operating systems
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Exploiting nonstationarity for performance prediction
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Event summarization for system management
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Performance impacts of autocorrelated flows in multi-tiered systems
Performance Evaluation
Towards an autonomic computing testbed
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Categorizing and differencing system behaviours
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Enabling policy-driven self-management for enterprise-scale systems
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Observer: keeping system models from becoming obsolete
HotAC II Hot Topics in Autonomic Computing on Hot Topics in Autonomic Computing
Mulini: an automated staging framework for QoS of distributed multi-tier applications
Proceedings of the 2007 workshop on Automating service quality: Held at the International Conference on Automated Software Engineering (ASE)
Predicting link quality using supervised learning in wireless sensor networks
ACM SIGMOBILE Mobile Computing and Communications Review
Adaptive quality of service management for enterprise services
ACM Transactions on the Web (TWEB)
Agile dynamic provisioning of multi-tier Internet applications
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Processor hardware counter statistics as a first-class system resource
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Why did my pc suddenly slow down?
SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
SPIKE: best practice generation for storage area networks
SYSML'07 Proceedings of the 2nd USENIX workshop on Tackling computer systems problems with machine learning techniques
Monitoring multi-tier clustered systems with invariant metric relationships
Proceedings of the 2008 international workshop on Software engineering for adaptive and self-managing systems
Causal analysis for performance modeling of computer programs
Scientific Programming
Cataclysm: Scalable overload policing for internet applications
Journal of Network and Computer Applications
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adaptive Monitoring with Dynamic Differential Tracing-Based Diagnosis
DSOM '08 Proceedings of the 19th IFIP/IEEE international workshop on Distributed Systems: Operations and Management: Managing Large-Scale Service Deployment
Performance profiling with EndoScope, an acquisitional software monitoring framework
Proceedings of the VLDB Endowment
Resource overbooking and application profiling in a shared Internet hosting platform
ACM Transactions on Internet Technology (TOIT)
Utility-driven proactive management of availability in enterprise-scale information flows
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
iManage: policy-driven self-management for enterprise-scale systems
Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Isolation points: Creating performance-robust enterprise systems
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
DIADS: addressing the "my-problem-or-yours" syndrome with integrated SAN and database diagnosis
FAST '09 Proccedings of the 7th conference on File and storage technologies
Configuration-space performance anomaly depiction
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
System monitoring with metric-correlation models: problems and solutions
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
NAP: a building block for remediating performance bottlenecks via black box network analysis
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
NetPrints: diagnosing home network misconfigurations using shared knowledge
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
An intelligent Quality of Service brokering model for e-commerce
Expert Systems with Applications: An International Journal
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Heteroscedastic models to track relationships between management metrics
IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Performance debugging in data centers: doing more with less
COMSNETS'09 Proceedings of the First international conference on COMmunication Systems And NETworks
Reduced dimension control based on online recursive principal component analysis
ACC'09 Proceedings of the 2009 conference on American Control Conference
Do you know your IQ?: a research agenda for information quality in systems
ACM SIGMETRICS Performance Evaluation Review
SelfTalk for Dena: query language and runtime support for evaluating system behavior
ACM SIGOPS Operating Systems Review
Fingerprinting the datacenter: automated classification of performance crises
Proceedings of the 5th European conference on Computer systems
SNTS: sensor network troubleshooting suite
DCOSS'07 Proceedings of the 3rd IEEE international conference on Distributed computing in sensor systems
Towards versatile performance models for complex, popular applications
ACM SIGMETRICS Performance Evaluation Review
CloudXplor: a tool for configuration planning in clouds based on empirical data
Proceedings of the 2010 ACM Symposium on Applied Computing
Bottleneck detection using statistical intervention analysis
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
iManage: policy-driven self-management for enterprise-scale systems
MIDDLEWARE2007 Proceedings of the 8th ACM/IFIP/USENIX international conference on Middleware
A study of dynamic meta-learning for failure prediction in large-scale systems
Journal of Parallel and Distributed Computing
On the use of computational geometry to detect software faults at runtime
Proceedings of the 7th international conference on Autonomic computing
PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems
Proceedings of the 7th international conference on Autonomic computing
Autonomic policy adaptation using decentralized online clustering
Proceedings of the 7th international conference on Autonomic computing
A methodology to support load test analysis
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Practical performance models for complex, popular applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automated debugging of SLO violations in enterprise systems
COMSNETS'10 Proceedings of the 2nd international conference on COMmunication systems and NETworks
Adaptive system anomaly prediction for large-scale hosting infrastructures
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Using virtualization for high availability and disaster recovery
IBM Journal of Research and Development
Automated experiment-driven management of (database) systems
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Lightweight, high-resolution monitoring for troubleshooting production systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
CLUEBOX: a performance log analyzer for automated troubleshooting
WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Empirical comparison of techniques for automated failure diagnosis
SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Detecting user-visible failures in AJAX web applications by analyzing users' interaction behaviors
Proceedings of the IEEE/ACM international conference on Automated software engineering
Diagnosing mobile applications in the wild
Hotnets-IX Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks
Application classification through monitoring and learning of resource consumption patterns
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 6th International COnference
Automating configuration troubleshooting with dynamic information flow analysis
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Analyzing web logs to detect user-visible failures
SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
Blink: managing server clusters on intermittent power
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Leveraging many simple statistical models to adaptively monitor software systems
International Journal of High Performance Computing and Networking
A root cause localization model for large scale systems
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
OLIC: online information compression for scalable hosting infrastructure monitoring
Proceedings of the Nineteenth International Workshop on Quality of Service
Automated control for elastic n-tier workloads based on empirical modeling
Proceedings of the 8th ACM international conference on Autonomic computing
Analyzing IPTV set-top box crashes
Proceedings of the 2nd ACM SIGCOMM workshop on Home networks
Large-scale app-based reporting of customer problems in cellular networks: potential and limitations
Proceedings of the first ACM SIGCOMM workshop on Measurements up the stack
PAL: Propagation-aware Anomaly Localization for cloud hosted distributed applications
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Q-score: proactive service quality assessment in a large IPTV system
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Session management of correlated multi-stream 3D tele-immersive environments
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Detecting bottleneck in -tier IT applications through analysis
DSOM'06 Proceedings of the 17th IFIP/IEEE international conference on Distributed Systems: operations and management
Utility-driven proactive management of availability in enterprise-scale information flows
Middleware'06 Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware
Modeling virtualized applications using machine learning techniques
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Modellus: Automated modeling of complex internet data center applications
ACM Transactions on the Web (TWEB)
Automated detection of performance regressions using statistical process control techniques
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Diagnosis of software failures using computational geometry
ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
An autonomic framework for enhancing the quality of data grid services
Future Generation Computer Systems
DAPA: diagnosing application performance anomalies for virtualized infrastructures
Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Structured comparative analysis of systems logs to diagnose performance problems
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Understanding and detecting real-world performance bugs
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Proceedings of the 9th international conference on Autonomic computing
X-ray: automating root-cause diagnosis of performance anomalies in production software
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Leveraging many simple statistical models to adaptively monitor software systems
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
G-RCA: a generic root cause analysis platform for service quality management in large IP networks
IEEE/ACM Transactions on Networking (TON)
A framework to compute statistics of system parameters from very large trace files
ACM SIGOPS Operating Systems Review
Fmeter: extracting indexable low-level system signatures by counting kernel function calls
Proceedings of the 13th International Middleware Conference
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Limplock: understanding the impact of limpware on scale-out cloud systems
Proceedings of the 4th annual Symposium on Cloud Computing
Performance troubleshooting in data centers: an annotated bibliography?
ACM SIGOPS Operating Systems Review
Towards detecting software performance anti-patterns using classification techniques
ACM SIGSOFT Software Engineering Notes
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
NetCheck: network diagnoses from blackbox traces
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
This paper studies the use of statistical induction techniques as a basis for automated performance diagnosis and performance management. The goal of the work is to develop and evaluate tools for offline and online analysis of system metrics gathered from instrumentation in Internet server platforms. We use a promising class of probabilistic models (Tree-Augmented Bayesian Networks or TANs) to identify combinations of system-level metrics and threshold values that correlate with high-level performance states--compliance with Service Level Objectives (SLOs) for average-case response time--in a three-tier Web service under a variety of conditions. Experimental results from a testbed show that TAN models involving small subsets of metrics capture patterns of performance behavior in a way that is accurate and yields insights into the causes of observed performance effects. TANs are extremely efficient to represent and evaluate, and they have interpretability properties that make them excellent candidates for automated diagnosis and control. We explore the use of TAN models for offline forensic diagnosis, and in a limited online setting for performance forecasting with stable workloads.