Development of an object-oriented DBMS
OOPLSA '86 Conference proceedings on Object-oriented programming systems, languages and applications
A performance analysis of the gamma database machine
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Soot - a Java bytecode optimization framework
CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
Extracting queries by static analysis of transparent persistence
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Interprocedural query extraction for transparent persistence
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
MRBench: A Benchmark for MapReduce Framework
ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
Queryll: Java database queries through bytecode rewriting
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
JReq: database queries in imperative languages
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics
Proceedings of the 2nd ACM Symposium on Cloud Computing
Trojan data layouts: right shoes for a running elephant
Proceedings of the 2nd ACM Symposium on Cloud Computing
SciHadoop: array-based query processing in Hadoop
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Invisible loading: access-driven data transfer from raw files into database systems
Proceedings of the 16th International Conference on Extending Database Technology
Optimizing database-backed applications with query synthesis
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Rhea: automatic filtering for unstructured cloud storage
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
MapReduce is a cost-effective way to achieve scalable performance for many log-processing workloads. These workloads typically process their entire dataset. MapReduce can be inefficient, however, when handling business-oriented workloads, especially when these workloads access only a subset of the data. HadoopToSQL seeks to improve MapReduce performance for the latter class of workloads by transforming MapReduce queries to use the indexing, aggregation and grouping features provided by SQL databases. It statically analyzes the computation performed by the MapReduce queries. The static analysis uses symbolic execution to derive preconditions and postconditions for the map and reduce functions. It then uses this information either to generate input restrictions, which avoid scanning the entire dataset, or to generate equivalent SQL queries, which take advantage of SQL grouping and aggregation features. We demonstrate the performance of MapReduce queries, when optimized by HadoopToSQL, by both single-node and cluster experiments. HadoopToSQL always improves performance over MapReduce and approximates that of hand-written SQL.