Cluster analysis and workload classification
ACM SIGMETRICS Performance Evaluation Review
Using regression splines for software performance analysis
Proceedings of the 2nd international workshop on Software and performance
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Core Vector Machines: Fast SVM Training on Very Large Data Sets
The Journal of Machine Learning Research
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Towards self-predicting systems: What if you could ask ‘what-if’?
The Knowledge Engineering Review
Detecting performance anomalies in global applications
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Top 10 algorithms in data mining
Knowledge and Information Systems
PQR: Predicting Query Execution Times for Autonomous Workload Management
ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
Modeling and exploiting query interactions in database systems
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The design of the force.com multitenant internet application development platform
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Query interactions in database workloads
Proceedings of the Second International Workshop on Testing Database Systems
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Runtime measurements in the cloud: observing, analyzing, and reducing variance
Proceedings of the VLDB Endowment
Predicting completion times of batch query workloads using interaction-aware models and simulation
Proceedings of the 14th International Conference on Extending Database Technology
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A bayesian approach to online performance modeling for database appliances using gaussian models
Proceedings of the 8th ACM international conference on Autonomic computing
Executing Data-Intensive Workloads in a Cloud
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Discovering Indicators for Congestion in DBMSs
ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
Provisioning data analytic workloads in a cloud
Future Generation Computer Systems
Hi-index | 0.00 |
The cloud computing paradigm provides the "illusion" of infinite resources and, therefore, becomes a promising candidate for large-scale data-intensive computing. In this paper, we explore experiment-driven performance models for data-intensive workloads executing in an infrastructure-as-a-service (IaaS) public cloud. The performance models help in predicting the workload behaviour, and serve as a key component of a larger framework for resource provisioning in the cloud. We determine a suitable prediction technique after comparing popular regression methods. We also enumerate the variables that impact variance in the workload performance in a public cloud. Finally, we build a performance model for a multi-tenant data service in the Amazon cloud. We find that a linear classifier is sufficient in most cases. On a few occasions, a linear classifier is unsuitable and non-linear modeling is required, which is time consuming. Consequently, we recommend that a linear classifier be used in training the performance model in the first instance. If the resulting model is unsatisfactory, then non-linear modeling can be carried out in the next step.