Optimizing equijoin queries in distributed databases where relations are hash partitioned
ACM Transactions on Database Systems (TODS)
Accurate modeling of the hybrid hash join algorithm
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Hash Partition Strategy for Distributed Query Processing
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Performance Measurements of Compressed Bitmap Indices
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Join algorithm costs revisited
The VLDB Journal — The International Journal on Very Large Data Bases
Time-Stratified Sampling for Approximate Answers to Aggregate Queries
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
HadoopDB in action: building real world applications
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MOSS-DB: a hardware-aware OLAP database
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
ONE: a predictable and scalable DW model
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
A predictable storage model for scalable parallel DW
Proceedings of the 15th Symposium on International Database Engineering & Applications
Cloudy: heterogeneous middleware for in time queries processing
Proceedings of the 17th International Database Engineering & Applications Symposium
Hi-index | 0.00 |
Parallel Shared-Nothing architectures are frequently used to handle large star-schema Data Warehouses (DW). The continuous increase in data volume and the star-schema storage organization introduce severe limitations to scalability due to the well-known parallel join issues and the resulting need to use solutions such as on-the fly repartitioning of data or intermediate results, or massive replication of large data sets that still need to be joined locally, constraining their ability to deliver fast results. Parallelism may improve query performance, however some business decisions may require that query results be timely available which, even with additional parallelism and significant upgrade costs (both monetary and due to disturbance of normal operations), cannot be guaranteed. We propose a Timely-aware Execution Parallel Architecture (TEEPA) which balances data load and query processing among an elastic set of non-dedicated heterogeneous nodes in order to provide scale-out performance and timely query results. Data is allocated using adaptable storage models to minimize join costs (the major uncertainty factor) which best fit the nodes' capabilities, while preserving a consistent logical view of the star-schema. We present experimental evaluation of TEEPA and demonstrate its ability to provide timely results.