ACM Transactions on Database Systems (TODS)
Optimizing queries using materialized views: a practical, scalable solution
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Proceedings of the 17th International Conference on Data Engineering
The GMAP: a versatile tool for physical data independence
The VLDB Journal — The International Journal on Very Large Data Bases
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
DBToaster: a SQL compiler for high-performance delta processing in main-memory databases
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
An architecture for recycling intermediates in a column-store
ACM Transactions on Database Systems (TODS)
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Nectar: automatic management of data and computation in datacenters
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
ReStore: reusing results of MapReduce jobs
Proceedings of the VLDB Endowment
DBToaster: higher-order delta processing for dynamic, frequently fresh views
Proceedings of the VLDB Endowment
BigBench: towards an industry standard benchmark for big data analytics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shark: SQL and rich analytics at scale
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
While originally proposed to provide fault-tolerance and scalability for data analysis queries on unstructured data over massive clusters, MapReduce systems today are being used for analysis of rich combinations of unstructured, semi-structured and structured data. To achieve performance on these new workloads, MapReduce systems (and the distributed file systems on which they are built) can no longer rely on static data placement strategies. In this thesis, we propose new physical data independence and adaptive data tuning solutions that can greatly improve the performance of analysis queries in systems where workloads are not static and where workloads may include complex queries with overlapping or related computations (subqueries). While profiting from the work on physical data independence in relational systems, we propose novel strategies that recognize the central role of data partitioning (and co-partitioning) in shared-nothing distributed file systems.