Principles of distributed database systems (2nd ed.)
Principles of distributed database systems (2nd ed.)
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Dynamic Load Distribution in the Borealis Stream Processor
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Customizable parallel execution of scientific stream queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Processing High-Volume Stream Queries on a Supercomputer
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Run-time operator state spilling for memory intensive long-running queries
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Contract-based load management in federated distributed systems
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using stream queries to measure communication performance of a parallel computing environment
ICDCSW '07 Proceedings of the 27th International Conference on Distributed Computing Systems Workshops
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Linear road: a stream data management benchmark
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Highly scalable trip grouping for large-scale collective transportation systems
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Query-aware partitioning for monitoring massive network data streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Toward massive query optimization in large-scale distributed stream systems
Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
XStream: a Signal-Oriented Data Stream Management System
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Thread cooperation in multicore architectures for frequency counting over multiple data streams
Proceedings of the VLDB Endowment
Efficient dynamic operator placement in a locally distributed continuous query system
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Virtualizing stream processing
Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Virtualizing stream processing
Proceedings of the 12th International Middleware Conference
Efficient ESL-Event-to-SQL translation
IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
A performance analysis of system s, s4, and esper via two level benchmarking
QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Hi-index | 0.00 |
Scalable execution of continuous queries over massive data streams often requires splitting input streams into parallel sub-streams over which query operators are executed in parallel. Automatic stream splitting is in general very difficult, as the optimal parallelization may depend on application semantics. To enable application specific stream splitting, we introduce splitstream functions where the user specifies non-procedural stream partitioning and replication. For high-volume streams, the stream splitting itself becomes a performance bottleneck. A cost model is introduced that estimates the performance of splitstream functions with respect to throughput and CPU usage. We implement parallel splitstream functions, and relate experimental results to cost model estimates. Based on the results, a splitstream function called autosplit is proposed, which scales well for high degrees of parallelism, and is robust for varying proportions of stream partitioning and replication. We show how user defined parallelization using autosplit provides substantially improved scalability (L = 64) over previously published results for the Linear Road Benchmark.