SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A Symmetric Fragment and Replicate Algorithm for Distributed Joinsyout
IEEE Transactions on Parallel and Distributed Systems
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
An Overview of The System Software of A Parallel Relational Database Machine GRACE
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Nephele: efficient parallel data processing in the cloud
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Massively parallel data analysis with PACTs on Nephele
Proceedings of the VLDB Endowment
Map-reduce extensions and recursive queries
Proceedings of the 14th International Conference on Extending Database Technology
ASTERIX: towards a scalable, semistructured data platform for evolving-world models
Distributed and Parallel Databases
CloudFuice: a flexible cloud-based data integration system
ICWE'11 Proceedings of the 11th international conference on Web engineering
ChuQL: processing XML with XQuery using Hadoop
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
Cluster computing, recursion and datalog
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
The HaLoop approach to large-scale iterative data analysis
The VLDB Journal — The International Journal on Very Large Data Bases
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Optimizing analytic data flows for multiple execution engines
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
An optimization framework for map-reduce queries
Proceedings of the 15th International Conference on Extending Database Technology
Transitive closure and recursive Datalog implemented on clusters
Proceedings of the 15th International Conference on Extending Database Technology
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
Integrating open government data with stratosphere for more transparency
Web Semantics: Science, Services and Agents on the World Wide Web
Massively-parallel stream processing under QoS constraints with Nephele
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Efficient multi-way theta-join processing using MapReduce
Proceedings of the VLDB Endowment
Opening the black boxes in data flow optimization
Proceedings of the VLDB Endowment
Spinning fast iterative data flows
Proceedings of the VLDB Endowment
Optimization of analytic data flows for next generation business intelligence applications
TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Myriad: parallel data generation on shared-nothing architectures
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
SCOPE: parallel databases meet MapReduce
The VLDB Journal — The International Journal on Very Large Data Bases
AMADA: web data repositories in the amazon cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
Proceedings of the 1st International Workshop on Model-Driven Engineering for High Performance and CLoud computing
Sparkler: supporting large-scale matrix factorization
Proceedings of the 16th International Conference on Extending Database Technology
Iterative parallel data processing with stratosphere: an inside look
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Designing a database system for modern processing architectures
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
A case for dynamic memory partitioning in data centers
Proceedings of the Second Workshop on Data Analytics in the Cloud
Reference representation techniques for large models
Proceedings of the Workshop on Scalability in Model Driven Engineering
Large-scale social-media analytics on stratosphere
Proceedings of the 22nd international conference on World Wide Web companion
Data-Fu: a language and an interpreter for interaction with read/write linked data
Proceedings of the 22nd international conference on World Wide Web
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
"All roads lead to Rome": optimistic recovery for distributed iterative data processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
PonIC: using stratosphere to speed up pig analytics
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Continuous cloud-scale query optimization and processing
Proceedings of the VLDB Endowment
Hardware-oblivious parallelism for in-memory column-stores
Proceedings of the VLDB Endowment
Scalable topic-specific influence analysis on microblogs
Proceedings of the 7th ACM international conference on Web search and data mining
Approaches to Distributed Execution of Scientific Workflows in Kepler
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hybrid Analytic Flows-the Case for Optimization
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Nephele streaming: stream processing under QoS constraints at scale
Cluster Computing
Hi-index | 0.00 |
We present a parallel data processor centered around a programming model of so called Parallelization Contracts (PACTs) and the scalable parallel execution engine Nephele [18]. The PACT programming model is a generalization of the well-known map/reduce programming model, extending it with further second-order functions, as well as with Output Contracts that give guarantees about the behavior of a function. We describe methods to transform a PACT program into a data flow for Nephele, which executes its sequential building blocks in parallel and deals with communication, synchronization and fault tolerance. Our definition of PACTs allows to apply several types of optimizations on the data flow during the transformation. The system as a whole is designed to be as generic as (and compatible to) map/reduce systems, while overcoming several of their major weaknesses: 1) The functions map and reduce alone are not sufficient to express many data processing tasks both naturally and efficiently. 2) Map/reduce ties a program to a single fixed execution strategy, which is robust but highly suboptimal for many tasks. 3) Map/reduce makes no assumptions about the behavior of the functions. Hence, it offers only very limited optimization opportunities. With a set of examples and experiments, we illustrate how our system is able to naturally represent and efficiently execute several tasks that do not fit the map/reduce model well.