Parallel analytics as a service

Authors:
Petrie Wong;Zhian He;Eric Lo
Affiliations:
The Hong Kong Polytechnic University, Hong Kong, Hong Kong;The Hong Kong Polytechnic University, Hong Kong, Hong Kong;The Hong Kong Polytechnic University, Hong Kong, Hong Kong
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 17
Cited 0

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Multi-tenant databases for software as a service: schema-mapping techniques

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting Database Applications as a Service

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Automatic virtual machine configuration for database workloads

ACM Transactions on Database Systems (TODS)
Native support of multi-tenancy in RDBMS for software as a service

Proceedings of the 14th International Conference on Extending Database Technology
Predicting completion times of batch query workloads using interaction-aware models and simulation

Proceedings of the 14th International Conference on Extending Database Technology
Workload-aware database monitoring and consolidation

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Performance prediction for concurrent database workloads

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Automated partitioning design in parallel database systems

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adapting microsoft SQL server for cloud computing

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Predicting in-memory database performance for automating cluster management tasks

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
CloudScale: elastic resource scaling for multi-tenant cloud systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
Multi-query SQL progress indicators

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Divergent physical design tuning for replicated databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Learning-based Query Performance Modeling and Prediction

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, massively parallel processing relational database systems (MPPDBs) have gained much momentum in the big data analytic market. With the advent of hosted cloud computing, we envision that the offering of MPPDB-as-a-Service (MPPDBaaS) will become attractive for companies having analytical tasks on only hundreds gigabytes to some ten terabytes of data because they can enjoy high-end parallel analytics at a cheap cost. This paper presents Thrifty, a prototype implementation of MPPDB-as-a-service. The major research issue is how to achieve a lower total cost of ownership by consolidating thousands of MPPDB tenants on to a shared hardware infrastructure, with a performance SLA that guarantees the tenants can obtain the query results as if they are executing their queries on dedicated machines. Thrifty achieves the goal by using a tenant-driven design that includes (1) a cluster design that carefully arranges the nodes in the cluster into groups and creates an MPPDB for each group of nodes, (2) a tenant placement that assigns each tenant to several MPPDBs (for high availability service through replication), and (3) a query routing algorithm that routes a tenant's query to the proper MPPDB at run-time. Experiments show that in a MPPDBaaS with 5000 tenants, where each tenant requests 2 to 32 nodes MPPDB to query against 200GB to 3.2TB of data, Thrifty can serve all the tenants with a 99.9% performance SLA guarantee and a high availability replication factor of 3, using only 18.7% of the nodes requested by the tenants.