LinearDB: a relational approach to make data warehouse scale like MapReduce

  • Authors:
  • Huiju Wang;Xiongpai Qin;Yansong Zhang;Shan Wang;Zhanwei Wang

  • Affiliations:
  • Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China;Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China), MOE, Beijing, P.R. China and School of Information, Renmin University of China, Beijing, P.R. China

  • Venue:
  • DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Operating on computer clusters, parallel databases enjoy enhanced performance. However, the scalability of a parallel database is limited by a number of factors. Although MapReduce-based systems are highly scalable, their performance is not satisfactory for data intensive applications. In this paper, we explore the feasibility of building a data warehouse that incorporates the best features from both technologies - the efficiency of parallel database and the scalability and fault tolerance of MapReduce. Towards this target, we design a prototype system called LinearDB. LinearDB organizes data in a decomposed snowflake schema and adopts three operations - transform, reduce and merge - to accomplish query processing. All these techniques are specially designed for the cluster environment. Our experimental results show that its scalability matches MapReduce and its performance is up to 3 times as good as that of PostgreSQL.