Execution and optimization of continuous queries with cyclops

Authors:
Harold Lim;Shivnath Babu
Affiliations:
Duke University, Durham, NC, USA;Duke University, Durham, NC, USA
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 8
Cited 0

NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continuous queries over data streams

ACM SIGMOD Record
SmartCIS: integrating digital and physical environments

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Temporal Analytics on Big Data for Web Advertising

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Muppet: MapReduce-style processing of fast data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the data collected by enterprises grows in scale, there is a growing trend of performing data analytics on large datasets. Batch processing systems that can handle petabyte scale of data, such as Hadoop, have flourished and gained traction in the industry. As the results of batch analytics have been used to continuously improve front-facing user experience, there is a growing interest in pushing the processing latency down. This trend has fueled a resurgence in the development and usage of execution engines that can process continuous queries. An important class of continuous queries is windowed aggregation queries. Such queries arise in a wide range of applications such as generating personalized content and results. Today, considerable manual effort goes into finding the most suitable execution engine for these queries and on tuning query performance on these engines. An ecosystem composed of multiple execution engines may be needed in order to run the overall query workload efficiently given the diverse set of requirements that arise in practice. Cyclops is a continuous query processing platform that manages and orchestrates windowed aggregation queries in an ecosystem composed of multiple continuous query execution engines. Cyclops employs a cost-based approach for picking the most suitable engine and plan for executing a given query. This demonstration first presents an interactive visualization of the rich execution plan space of windowed aggregation queries, which allows users to analyze and understand the differences among plans. The next part of the demonstration will drill down into the design of Cyclops. For a given query, we show the cost spectrum of query execution plans across three different execution engines---Esper, Storm, and Hadoop---as estimated by Cyclops.