Efficiently updating materialized views
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
Parallel database systems: the future of high performance database systems
Communications of the ACM
Continuous queries over append-only databases
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
A categorized bibliography on incremental computation
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient and extensible algorithms for multi query optimization
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Automated Selection of Materialized Views and Indexes in SQL Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Redbrick Vista: Aggregate Computation and Management
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Efficient exploitation of similar subexpressions for query processing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Cooperative scans: dynamic bandwidth sharing in a DBMS
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic optimization of parallel dataflow programs
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Scheduling shared scans of large data files
Proceedings of the VLDB Endowment
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Parallel Evaluation of Composite Aggregate Queries
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
DryadInc: reusing work in large-scale computations
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Nectar: automatic management of data and computation in datacenters
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Nova: continuous Pig/Hadoop workflows
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The case for being lazy: how to leverage lazy evaluation in MapReduce
Proceedings of the 2nd international workshop on Scientific cloud computing
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Incoop: MapReduce for incremental computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
Proceedings of the 2nd ACM SIGSPATIAL International Workshop on GeoStreaming
An overview of CMPI: network performance aware MPI in the cloud
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Kineograph: taking the pulse of a fast-changing and connected world
Proceedings of the 7th ACM european conference on Computer Systems
In-situ MapReduce for log processing
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Shredder: GPU-accelerated incremental storage and computation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Optimizing data shuffling in data-parallel computation by understanding user-defined functions
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Maestro: Replica-Aware Map Scheduling for MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Facilitating real-time graph mining
Proceedings of the fourth international workshop on Cloud data management
Improving large graph processing on partitioned graphs in the cloud
Proceedings of the Third ACM Symposium on Cloud Computing
Streaming big data with self-adjusting computation
DDFP '13 Proceedings of the 2013 workshop on Data driven functional programming
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Discretized streams: fault-tolerant streaming computation at scale
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
MROrder: flexible job ordering optimization for online mapreduce workloads
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Batched stream processing is a new distributed data processing paradigm that models recurring batch computations on incrementally bulk-appended data streams. The model is inspired by our empirical study on a trace from a large-scale production data-processing cluster; it allows a set of effective query optimizations that are not possible in a traditional batch processing model. We have developed a query processing system called Comet that embraces batched stream processing and integrates with DryadLINQ. We used two complementary methods to evaluate the effectiveness of optimizations that Comet enables. First, a prototype system deployed on a 40-node cluster shows an I/O reduction of over 40% using our benchmark. Second, when applied to a real production trace covering over 19 million machine-hours, our simulator shows an estimated I/O saving of over 50%.