Implicit parallel programming in pH
Implicit parallel programming in pH
C**: A Large-Grain, Object-Oriented, Data-Parallel Programming Language
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
New Ideas in Parallel Lisp: Language Design, Implementation, and Programming Tools
Proceedings of the US/Japan Workshop on Parallel Lisp: Languages and Systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
LINQ: reconciling object, relations and XML in the .NET framework
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Experiences with MapReduce, an abstraction for large-scale computation
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
Steno: automatic optimization of declarative queries
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the second international workshop on MapReduce and its applications
Fay: extensible distributed tracing from kernels to clusters
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Streams that compose using macros that oblige
PEPM '12 Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulation
Communications of the ACM
Jockey: guaranteed job latency in data parallel clusters
Proceedings of the 7th ACM european conference on Computer Systems
MadLINQ: large-scale distributed matrix computation for the cloud
Proceedings of the 7th ACM european conference on Computer Systems
Queue - Development
PerfXplain: debugging MapReduce job performance
Proceedings of the VLDB Endowment
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Re-optimizing data-parallel computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Asynchronous adaptive optimisation for generic data-parallel array programming
Concurrency and Computation: Practice & Experience
Swift: A language for distributed parallel scripting
Parallel Computing
Journal of Systems and Software
From a calculus to an execution environment for stream processing
Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Stubby: a transformation-based optimizer for MapReduce workflows
Proceedings of the VLDB Endowment
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
PQL: a purely-declarative java extension for parallel programming
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Auto-parallelizing stateful distributed streaming applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Fay: Extensible Distributed Tracing from Kernels to Clusters
ACM Transactions on Computer Systems (TOCS)
Spotting code optimizations in data-parallel pipelines through PeriSCOPE
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Scripting distributed scientific workflows using Weaver
Concurrency and Computation: Practice & Experience
Coflow: a networking abstraction for cluster applications
Proceedings of the 11th ACM Workshop on Hot Topics in Networks
Using clouds for MapReduce measurement assignments
ACM Transactions on Computing Education (TOCE)
Just-in-time data distribution for analytical query processing
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
Reify your collection queries for modularity and speed!
Proceedings of the 12th annual international conference on Aspect-oriented software development
BigBench: towards an industry standard benchmark for big data analytics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
TimeStream: reliable stream computation in the cloud
Proceedings of the 8th ACM European Conference on Computer Systems
Optimus: a dynamic rewriting framework for data-parallel execution plans
Proceedings of the 8th ACM European Conference on Computer Systems
A bloat-aware design for big data applications
Proceedings of the 2013 international symposium on memory management
Boa: a language and infrastructure for analyzing ultra-large-scale software repositories
Proceedings of the 2013 International Conference on Software Engineering
A characteristic study on failures of production distributed data-parallel programs
Proceedings of the 2013 International Conference on Software Engineering
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Task fusion: improving utilization of multi-user clusters
Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanity
Forge: generating a high performance DSL implementation from a declarative specification
Proceedings of the 12th international conference on Generative programming: concepts & experiences
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
The shape of things to run: compiling complex stream graphs to reconfigurable hardware in lime
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
A scalable approach to column-based low-rank matrix approximation
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scalable, example-based refactorings with refaster
Proceedings of the 2013 ACM workshop on Workshop on refactoring tools
Hi-index | 0.03 |
MapReduce and similar systems significantly ease the task of writing data-parallel code. However, many real-world computations require a pipeline of MapReduces, and programming and managing such pipelines can be difficult. We present FlumeJava, a Java library that makes it easy to develop, test, and run efficient data-parallel pipelines. At the core of the FlumeJava library are a couple of classes that represent immutable parallel collections, each supporting a modest number of operations for processing them in parallel. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies. To enable parallel operations to run efficiently, FlumeJava defers their evaluation, instead internally constructing an execution plan dataflow graph. When the final results of the parallel operations are eventually needed, FlumeJava first optimizes the execution plan, and then executes the optimized operations on appropriate underlying primitives (e.g., MapReduces). The combination of high-level abstractions for parallel data and computation, deferred evaluation and optimization, and efficient parallel primitives yields an easy-to-use system that approaches the efficiency of hand-optimized pipelines. FlumeJava is in active use by hundreds of pipeline developers within Google.