A Fast Regular Expression Indexing Engine
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
A common substrate for cluster computing
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
Trojan data layouts: right shoes for a running elephant
Proceedings of the 2nd ACM Symposium on Cloud Computing
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
Only aggressive elephants are fast elephants
Proceedings of the VLDB Endowment
Invisible loading: access-driven data transfer from raw files into database systems
Proceedings of the 16th International Conference on Extending Database Technology
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
The MapReduce distributed programming framework is very popular, but currently lacks the optimization techniques that have been standard with relational database systems for many years. This paper proposes Manimal, which uses static code analysis to detect MapReduce program semantics and thereby enable wholly-automatic optimization of MapReduce programs. For example, a programmer's map function that emits data only when an if... statement holds true is essentially encoding a selection condition; code analysis can detect and characterize these conditions. If Manimal has an appropriate index available, it can then alter MapReduce execution to use it. Manimal can address many different optimization opportunities, including projections, structure-aware data compression, and others. However, this paper illustrates the system by focusing on one: efficient selection. We give a static analysis algorithm that can detect selections in user programs, and cover how Manimal can employ a B+Tree to execute these selections efficiently at runtime. Testing Manimal on several standard MapReduce programs, we show that selection alone can automatically reduce a standard program's runtime to 63% of conventional MapReduce execution time on identical hardware. We also give an in-depth discussion of other optimization targets and detection techniques.