Maintaining views incrementally
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Predicate migration: optimizing queries with expensive predicates
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Optimization of parallel query execution plans in XPRS
Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
R* Optimizer Validation and Performance Evaluation for Distributed Queries
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Including Group-By in Query Optimization
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Volcano Optimizer Generator: Extensibility and Efficient Search
Proceedings of the Ninth International Conference on Data Engineering
Exploiting Punctuation Semantics in Continuous Data Streams
IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Querying the internet with PIER
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Recursive Computation of Regions and Connectivity in Networks
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Hadoop: The Definitive Guide
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Making every bit count in wide-area analytics
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Large-scale computation not at the cost of expressiveness
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
i2MapReduce: incremental iterative MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
"All roads lead to Rome": optimistic recovery for distributed iterative data processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Distributed socialite: a datalog-based language for large-scale graph analysis
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In today's Web and social network environments, query workloads include ad hoc and OLAP queries, as well as iterative algorithms that analyze data relationships (e.g., link analysis, clustering, learning). Modern DBMSs support ad hoc and OLAP queries, but most are not robust enough to scale to large clusters. Conversely, "cloud" platforms like MapReduce execute chains of batch tasks across clusters in a fault tolerant way, but have too much overhead to support ad hoc queries. Moreover, both classes of platform incur significant overhead in executing iterative data analysis algorithms. Most such iterative algorithms repeatedly refine portions of their answers, until some convergence criterion is reached. However, general cloud platforms typically must reprocess all data in each step. DBMSs that support recursive SQL are more efficient in that they propagate only the changes in each step --- but they still accumulate each iteration's state, even if it is no longer useful. User-defined functions are also typically harder to write for DBMSs than for cloud platforms. We seek to unify the strengths of both styles of platforms, with a focus on supporting iterative computations in which changes, in the form of deltas, are propagated from iteration to iteration, and state is efficiently updated in an extensible way. We present a programming model oriented around deltas, describe how we execute and optimize such programs in our REX runtime system, and validate that our platform also handles failures gracefully. We experimentally validate our techniques, and show speedups over the competing methods ranging from 2.5 to nearly 100 times.