Performance tradeoffs for client-server query processing
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cache investment: integrating query optimization and distributed data placement
ACM Transactions on Database Systems (TODS)
Cache-Aware Query Routing in a Cluster of Databases
Proceedings of the 17th International Conference on Data Engineering
Parallel querying with non-dedicated computers
VLDB '05 Proceedings of the 31st international conference on Very large data bases
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Extending DBMSs with satellite databases
The VLDB Journal — The International Journal on Very Large Data Bases
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
The Data Cyclotron query processing scheme
Proceedings of the 13th International Conference on Extending Database Technology
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
An architecture for recycling intermediates in a column-store
ACM Transactions on Database Systems (TODS)
The performance of MapReduce: an in-depth study
Proceedings of the VLDB Endowment
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Zephyr: live migration in shared nothing databases for elastic cloud platforms
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Hardware-oblivious parallelism for in-memory column-stores
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs. Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.