Algorithms for loading parallel grid files
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Probabilistic methods in query processing
Probabilistic methods in query processing
Rectilinear partitioning of irregular data parallel computations
Journal of Parallel and Distributed Computing
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
Time- and space-optimality in B-trees
ACM Transactions on Database Systems (TODS)
Fundamentals of Computer Alori
Fundamentals of Computer Alori
Sampling Issues in Parallel Database Systems
EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
GBI: A Generalized R-Tree Bulk-Insertion Strategy
SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Speeding up construction of PMR quadtree-based spatial indexes
The VLDB Journal — The International Journal on Very Large Data Bases
Parallel bulk-loading of spatial data
Parallel Computing - Special issue: High performance computing with geographical data
Optimized Data Loading for a Multi-Terabyte Sky Survey Repository
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A fast and robust bulk-loading algorithm for indexing very large digital elevation datasets
Computers & Geosciences
Research and implement of real-time data loading system IMIL
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Hi-index | 0.00 |
This paper considers the problem of bulk-loading large data sets for the gridfile multiattribute indexing technique. We propose a rectilinear partitioning algorithm that heuristically seeks to minimize the size of the gridfile needed to ensure no bucket overflows. Empirical studies on both synthetic data sets and on data sets drawn from computational fluid dynamics applications demonstrate that our algorithm is very efficient, and is able to handle large data sets. In addition, we present an algorithm for bulk-loading data sets too large to fit in main memory. Utilizing a sort of the entire data set it creates a gridfile without incurring any overflows.