Towards building performance models for data-intensive workloads in public clouds

Authors:
Rizwan Mian;Patrick Martin;Farhana Zulkernine;Jose Luis Vazquez-Poletti
Affiliations:
Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;Universidad Complutense de Madrid, Madrid, Spain
Venue:
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Year:
2013

Citing 25
Cited 0

Cluster analysis and workload classification

ACM SIGMETRICS Performance Evaluation Review
Using regression splines for software performance analysis

Proceedings of the 2nd international workshop on Software and performance
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Core Vector Machines: Fast SVM Training on Very Large Data Sets

The Journal of Machine Learning Research
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)

Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Towards self-predicting systems: What if you could ask ‘what-if’?

The Knowledge Engineering Review
Detecting performance anomalies in global applications

WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Self-tuning database technology and information services: from wishful thinking to viable engineering

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Top 10 algorithms in data mining

Knowledge and Information Systems
PQR: Predicting Query Execution Times for Autonomous Workload Management

ICAC '08 Proceedings of the 2008 International Conference on Autonomic Computing
Modeling and exploiting query interactions in database systems

Proceedings of the 17th ACM conference on Information and knowledge management
R-Capriccio: a capacity planning and anomaly detection tool for enterprise services with live workloads

Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The design of the force.com multitenant internet application development platform

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Query interactions in database workloads

Proceedings of the Second International Workshop on Testing Database Systems
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Runtime measurements in the cloud: observing, analyzing, and reducing variance

Proceedings of the VLDB Endowment
Predicting completion times of batch query workloads using interaction-aware models and simulation

Proceedings of the 14th International Conference on Extending Database Technology
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
A bayesian approach to online performance modeling for database appliances using gaussian models

Proceedings of the 8th ACM international conference on Autonomic computing
Executing Data-Intensive Workloads in a Cloud

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Discovering Indicators for Congestion in DBMSs

ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
Provisioning data analytic workloads in a cloud

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cloud computing paradigm provides the "illusion" of infinite resources and, therefore, becomes a promising candidate for large-scale data-intensive computing. In this paper, we explore experiment-driven performance models for data-intensive workloads executing in an infrastructure-as-a-service (IaaS) public cloud. The performance models help in predicting the workload behaviour, and serve as a key component of a larger framework for resource provisioning in the cloud. We determine a suitable prediction technique after comparing popular regression methods. We also enumerate the variables that impact variance in the workload performance in a public cloud. Finally, we build a performance model for a multi-tenant data service in the Amazon cloud. We find that a linear classifier is sufficient in most cases. On a few occasions, a linear classifier is unsuitable and non-linear modeling is required, which is time consuming. Consequently, we recommend that a linear classifier be used in training the performance model in the first instance. If the resulting model is unsatisfactory, then non-linear modeling can be carried out in the next step.