SYSTEM/U: a database system based on the universal relation assumption
ACM Transactions on Database Systems (TODS)
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Heuristic optimization of OLAP queries in multidimensionally hierarchically clustered databases
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
The Universal B-Tree for Multidimensional Indexing: general Concepts
WWCA '97 Proceedings of the International Conference on Worldwide Computing and Its Applications
Improving OLAP Performance by Multidimensional Hierarchical Clustering
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Processing star queries on hierarchically-clustered fact tables
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Constant-Time Query Processing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Operating on computer clusters, parallel databases enjoy enhanced performance. However, the scalability of a parallel database is limited by a number of factors. Although MapReduce-based systems are highly scalable, their performance is not satisfactory for data intensive applications. In this paper, we explore the feasibility of building a data warehouse that incorporates the best features from both technologies - the efficiency of parallel database and the scalability and fault tolerance of MapReduce. Towards this target, we design a prototype system called LinearDB. LinearDB organizes data in a decomposed snowflake schema and adopts three operations - transform, reduce and merge - to accomplish query processing. All these techniques are specially designed for the cluster environment. Our experimental results show that its scalability matches MapReduce and its performance is up to 3 times as good as that of PostgreSQL.