Parallel database systems: the future of high performance database systems
Communications of the ACM
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automating physical database design in a parallel database
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data Reduction by Partial Preaggregation
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Holistic UDAFs at streaming speeds
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On scalable attack detection in the network
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Customizable parallel execution of scientific stream queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Contract-based load management in federated distributed systems
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Tribeca: a system for managing large databases of network traffic
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
An integration framework for sensor networks and data stream management systems
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Scaling issues in network monitoring
SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
SLIPstream: scalable low-latency interactive perception on streaming data
Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
Distributed event stream processing with non-deterministic finite automata
Proceedings of the Third ACM International Conference on Distributed Event-Based Systems
Parallel detection of temporal events from streaming data
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Scalable splitting of massive data streams
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Dynamic routing of data stream tuples among parallel query plan running on multi-core processors
Distributed and Parallel Databases
Adaptive input admission and management for parallel stream processing
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
Data Stream Management Systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single centralized server. In particular, a single-server instance of our DSMS, Gigascope, cannot keep up with the processing demands of the new OC-786 networks, which can generate more than 100 million packets per second. In this paper, we explore a mechanism for the distributed processing of very high speed data streams. Existing distributed DSMSs employ two mechanisms for distributing the load across the participating machines: partitioning of the query execution plans and partitioning of the input data stream in a query-independent fashion. However, for a large class of queries, both approaches fail to reduce the load as compared to centralized system, and can even lead to an increase in the load. In this paper we present an alternative approach - query-aware data stream partitioning that allows for more efficient scaling. We present methods for analyzing any given query set and choose the optimal partitioning scheme, and show how to reconcile potentially conflicting requirements that different queries might place on partitioning. We conclude with experiments on a small cluster of processing nodes on high-rate network traffic feed that demonstrates with different query sets that our methods effectively distribute the load across all processing nodes and facilitate efficient scaling whenever more processing nodes becomes available.