Cloudy: heterogeneous middleware for in time queries processing

Authors:
Pedro Martins;Maryam Abbasi;Pedro Furtado
Affiliations:
University of Coimbra, Coimbra, Portugal;University of Coimbra, Coimbra, Portugal;University of Coimbra, Coimbra, Portugal
Venue:
Proceedings of the 17th International Database Engineering & Applications Symposium
Year:
2013

Citing 20
Cited 0

Hancock: a language for extracting signatures from data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Time-Stratified Sampling for Approximate Answers to Aggregate Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Task Assignment with Unknown Duration

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
GATES: A Grid-Based Middleware for Processing Distributed Data Streams

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Control-Based Scheduling in a Distributed Stream Processing System

SCW '06 Proceedings of the IEEE Services Computing Workshops
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Compression-based methods for nonparametric prediction and estimation of some characteristics of time series

IEEE Transactions on Information Theory
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
HadoopDB in action: building real world applications

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
StreamCloud: A Large Scale Data Streaming System

ICDCS '10 Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Efficient processing of data warehousing queries in a split execution environment

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A predictable storage model for scalable parallel DW

Proceedings of the 15th Symposium on International Database Engineering & Applications
Moving range queries in distributed complex event processing

Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems
TEEPA: a timely-aware elastic parallel architecture

Proceedings of the 16th International Database Engineering & Applications Sysmposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel share-nothing architectures are currently used to handle large amounts of data arriving in real-time for processing. The continuous increase on data volume and organization, introduce several limitations to scalability and quality of service (QoS) due to processing requirements and joins. Parallelism may improve query performance, however some business require timely results (results not faster or slower than specified) which, even with additional parallelism and significant upgrade costs (both monetary and due to disturbance of normal operations), cannot be guaranteed. We propose a timely-aware execution architecture, Cloudy, which balances data and queries processing among an elastic set of non-dedicated and heterogeneous nodes in order to provide scale-out performance and timely results, nor faster or slower, using both Complex Event Processing (CEP) and database (DB). Data is distributed by nodes accordingly with their hardware characteristics, then a set of layered mechanisms rearrange queries in order to provide in timely results. We present experimental evaluation of Cloudy and demonstrate its ability to provide timely results.