Loading databases using dataflow parallelism
ACM SIGMOD Record
Database programming with JDBC and JAVA
Database programming with JDBC and JAVA
Microsoft TerraServer: a spatial data warehouse
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient Bulk-Loading of Gridfiles
IEEE Transactions on Knowledge and Data Engineering
An Evaluation of Generic Bulk Loading Techniques
Proceedings of the 27th International Conference on Very Large Data Bases
Bulk Loading into an OODB: A Performance Study
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Parallel bulk-loading of spatial data
Parallel Computing - Special issue: High performance computing with geographical data
A declarative approach to optimize bulk loading into databases
ACM Transactions on Database Systems (TODS)
Towards high performance and high availability clusters of archived stream
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
DRO+: a systemic and economical approach to improve availability of massive database systems
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Research and implement of real-time data loading system IMIL
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Hi-index | 0.00 |
Advanced instruments in a variety of scientific domains are collecting massive amounts of data that must be postprocessed and organized to support research activities. Astronomers have been pioneers in the use of databases to host sky survey data. Increasing data volumes from more powerful telescopes pose enormous challenges to state-ofthe- art database systems and data-loading techniques. In this paper we present SkyLoader, our novel framework for data loading that is being used to populate a multi-table, multi-terabyte database repository for the Palomar-Quest sky survey. SkyLoader consists of an efficient algorithm for bulk loading, an effective data structure to support data integrity, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques, with load time for a 40-gigabyte data set reduced from over 20 hours to less than 3 hours. Our framework offers a promising approach for loading other large and complex scientific databases.