Servicing range queries on multidimensional datasets with partial replicas

Authors:
L. Weng;U. Catalyurek;T. Kurc;Gagan Agrawal;J. Saltz
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University
Venue:
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Year:
2005

Citing 0
Cited 3

Optimizing multiple queries on scientific datasets with partial replicas

GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
A Fast Location Service for Partial Spatial Replicas

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Towards dynamic data-driven management of the ruby gulch waste repository

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time. In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.