Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The multidimensional database system RasDaMan
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Compiling object-oriented data intensive applications
Proceedings of the 14th international conference on Supercomputing
Compiler supported high-level abstractions for sparse disk-resident datasets
ICS '02 Proceedings of the 16th international conference on Supercomputing
Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases
IEEE Transactions on Visualization and Computer Graphics
Efficient Organization of Large Multidimensional Arrays
Proceedings of the Tenth International Conference on Data Engineering
Infrastructure for Building Parallel Database Systems for Multi-Dimensional Data
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Geo/Environmental and Medical Data Management in the RasDaMan System
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Applying Database Support for Large Scale Data Driven Science in Distributed Environments
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Optimizing Reduction Computations In a Distributed Environment
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Driving scientific applications by data in distributed environments
ICCS'03 Proceedings of the 2003 international conference on Computational science
Hi-index | 0.00 |
There is an increasing trends towards distributed and shared repositories for storing scientific datasets. Developing applications that retrieve and process data from such repositories involves a number of challenges. First, these data repositories store data in complex, low-level layouts, which should be abstracted from application developers. Second, as data repositories are shared resources, part of the computations on the data must be performed at a different set of machines than the ones hosting the data. Third, because of the volume of data and the amount of computations involved, parallel configurations need to be used for both hosting the data and the processing on the retrieved data. In this paper, we describe a system for executing SQL-3 queries over scientific data stored as flat-files. A relational table-based virtual view is supported on these flat-file datasets. The class of queries we consider involve data retrieval using Select and Where clauses, and processing with user-defined aggregate functions and group-bys. We use a middleware system STORM for providing much of the low-level functionality. Our compiler analyzes the SQL-3 queries and generates many of the functions required by this middleware. Our experimental results show good scalability with respect to the number of nodes as well as the dataset size.