LinearDB: a relational approach to make data warehouse scale like MapReduce

Authors:
Huiju Wang;Xiongpai Qin;Yansong Zhang;Shan Wang;Zhanwei Wang
Affiliations:
Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China
Venue:
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Year:
2011

Citing 11
Cited 0

SYSTEM/U: a database system based on the universal relation assumption

ACM Transactions on Database Systems (TODS)
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Heuristic optimization of OLAP queries in multidimensionally hierarchically clustered databases

Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
The Universal B-Tree for Multidimensional Indexing: general Concepts

WWCA '97 Proceedings of the International Conference on Worldwide Computing and Its Applications
Improving OLAP Performance by Multidimensional Hierarchical Clustering

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Processing star queries on hierarchically-clustered fact tables

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Constant-Time Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Operating on computer clusters, parallel databases enjoy enhanced performance. However, the scalability of a parallel database is limited by a number of factors. Although MapReduce-based systems are highly scalable, their performance is not satisfactory for data intensive applications. In this paper, we explore the feasibility of building a data warehouse that incorporates the best features from both technologies - the efficiency of parallel database and the scalability and fault tolerance of MapReduce. Towards this target, we design a prototype system called LinearDB. LinearDB organizes data in a decomposed snowflake schema and adopts three operations - transform, reduce and merge - to accomplish query processing. All these techniques are specially designed for the cluster environment. Our experimental results show that its scalability matches MapReduce and its performance is up to 3 times as good as that of PostgreSQL.