MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Advanced compiler design and implementation
Advanced compiler design and implementation
Experimental study of minimum cut algorithms
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
ICSE '81 Proceedings of the 5th international conference on Software engineering
LINQ: reconciling object, relations and XML in the .NET framework
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Inferring Dataflow Properties of User Defined Table Processors
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
Distributed aggregation for data-parallel computing: interfaces and implementations
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopToSQL: a mapReduce query optimizer
Proceedings of the 5th European conference on Computer systems
On the relative completeness of bytecode analysis versus source code analysis
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
FlumeJava: easy, efficient data-parallel pipelines
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Optimizing data partitioning for data-parallel computing
HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Leveraging endpoint flexibility in data-intensive clusters
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Hi-index | 0.00 |
To minimize the amount of data-shuffling I/O that occurs between the pipeline stages of a distributed data-parallel program, its procedural code must be optimized with full awareness of the pipeline that it executes in. Unfortunately, neither pipeline optimizers nor traditional compilers examine both the pipeline and procedural code of a data-parallel program so programmers must either hand-optimize their program across pipeline stages or live with poor performance. To resolve this tension between performance and programmability, this paper describes PeriSCOPE, which automatically optimizes a data-parallel program's procedural code in the context of data flow that is reconstructed from the program's pipeline topology. Such optimizations eliminate unnecessary code and data, perform early data filtering, and calculate small derived values (e.g., predicates) earlier in the pipeline, so that less data--sometimes much less data--is transferred between pipeline stages. We describe how PeriSCOPE is implemented and evaluate its effectiveness on real production jobs.