Efficient execution of multiple query workloads in data analysis applications
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Processing large-scale multi-dimensional data in parallel and distributed environments
Parallel Computing - Parallel data-intensive algorithms and applications
Efficient Manipulation of Large Datasets on Heterogeneous Storage Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Executing multiple pipelined data analysis operations in the grid
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimizing Reduction Computations In a Distributed Environment
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Dynamic Chunking for Out-of-Core Volume Visualization Applications
ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
Driving scientific applications by data in distributed environments
ICCS'03 Proceedings of the 2003 international conference on Computational science
Hi-index | 0.00 |
We have developed the Active Data Repository (ADR), an infrastructure that integrates storage, retrieval, and processing of large multi-dimensional scientific datasets on distributed memory parallel machines with multiple disks attached to each node. In earlier work, we proposed three strategies for processing range queries within the ADR framework. Our experimental results show that the relative performance of the strategies changes under varying application characteristics and machine configurations.In this work, we investigate approaches to guide and automate the selection of the best strategy for a given application and machine configuration. We describe analytical models to predict the relative performance of the strategies when input data elements are uniformly distributed in the attribute space of the output dataset, restricting the output dataset to be a regular d-dimensional array.