On optimal processor allocation to support pipelined hash joins
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Multi-table joins through bitmapped join indices
ACM SIGMOD Record
Building the data warehouse (2nd ed.)
Building the data warehouse (2nd ed.)
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Materialized views and data warehouses
ACM SIGMOD Record
Caching multidimensional queries using chunks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins
IEEE Transactions on Knowledge and Data Engineering
Parallel Star Join + DataIndexes: Efficient Query Processing in Data Warehouses and OLAP
IEEE Transactions on Knowledge and Data Engineering
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Processing star queries on hierarchically-clustered fact tables
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star join revisited: Performance internals for cluster architectures
Data & Knowledge Engineering
Data mining using high performance data clouds: experimental studies using sector and sphere
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
ACM SIGMETRICS Performance Evaluation Review
Cache conscious star-join in MapReduce environments
Proceedings of the 2nd International Workshop on Cloud Intelligence
Cloud-aware processing of MapReduce-based OLAP applications
AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
A MapReduce task scheduling algorithm for deadline constraints
Cluster Computing
Hi-index | 0.00 |
A data-parallel framework is very attractive for large-scale data processing since it enables such an application to easily process a huge amount of data on commodity machines. MapReduce, a popular data-parallel framework, is used in various fields such as web search, data mining and data warehouses; it is proven to be very practical for such a data-parallel application. A star-join query is a popular query in data warehouses that are a current target domain of data-parallel frameworks. This article proposes a new algorithm that efficiently processes star-join queries in data-parallel frameworks such as MapReduce and Dryad. Our star-join algorithm for general data-parallel frameworks is called Scatter-Gather-Merge, and it processes star-join queries in a constant number of computation steps, although the number of participating dimension tables increases. By adopting bloom filters, Scatter-Gather-Merge reduces a non-trivial amount of IO. We also show that Scatter-Gather-Merge can be easily applied to MapReduce. Our experimental results in both cluster and cloud environments show that Scatter-Gather-Merge outperforms existing approaches.