dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries

Authors:
Beth Plale;Karsten Schwan
Affiliations:
-;-
Venue:
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Year:
2000

Citing 0
Cited 18

Processing large-scale multi-dimensional data in parallel and distributed environments

Parallel Computing - Parallel data-intensive algorithms and applications
Efficient Manipulation of Large Datasets on Heterogeneous Storage Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimizations Enabled by Relational Data Model View to Querying Data Streams

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Taking the Step From Meta-Information to Communication Middleware in Computational Data Streams

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Adaptive Query Processing: A Survey

BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
Active Proxy-G: optimizing the query execution process in the grid

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Executing multiple pipelined data analysis operations in the grid

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Dynamic Querying of Streaming Data with the dQUOB System

IEEE Transactions on Parallel and Distributed Systems
Leveraging Run Time Knowledge about Event Rates to Improve Memory Utilization in Wide Area Data Stream Filtering

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
On Network CoProcessors for Scalable, Predictable Media Services

IEEE Transactions on Parallel and Distributed Systems
Use of PVFS for Efficient Execution of Jobs with Pipeline-Shared I/O

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Optimizing Reduction Computations In a Distributed Environment

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Evaluation of Rate-Based Adaptivity in Asynchronous Data Stream Joins

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Improving Data Access for Computational Grid Applications

Cluster Computing
I-RMI: performance isolation in information flow applications

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
Driving scientific applications by data in distributed environments

ICCS'03 Proceedings of the 2003 international conference on Computational science
I-RMI: performance isolation in information flow applications

Middleware'05 Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dQUOB system satisfies client need for specific information from high-volume data streams. The data streams we speak of are the flow of data existing during large-scale visualizations, video streaming to large numbers of distributed users, and high volume business transactions. We introduce the notion of conceptualizing a data stream as a set of relational database tables so that a scientist can request information with an SQL-like query. Transformation or computation that often needs to be performed on the data in route can be conceptualized as computation performed on consecutive views of the data, with computation associated with each view. The dQUOB system moves the query code into the data stream as a quoblet as compiled code. The relational database data model has the significant advantage of presenting opportunities for efficient re-optimizations of queries and sets of queries.Using examples from global atmospheric modeling, we illustrate the usefulness of the dQUOB system. We carry the examples through the experiments to establish the viability of the approach for high performance computing with a baseline benchmark. We define a cost-metric of end-to-end latency that can be used to determine realistic cases where optimization should be applied. Finally, we show that end-to-end latency can be controlled through a probability assigned to a query that a query will evaluate to true.