Optimized Data Loading for a Multi-Terabyte Sky Survey Repository

  • Authors:
  • Y. Dora Cai;Ruth Aydt;Robert J. Brunner

  • Affiliations:
  • National Center for Supercomputing Applications (NCSA);National Center for Supercomputing Applications (NCSA);University of Illinois at Urbana-Champaign

  • Venue:
  • SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Advanced instruments in a variety of scientific domains are collecting massive amounts of data that must be postprocessed and organized to support research activities. Astronomers have been pioneers in the use of databases to host sky survey data. Increasing data volumes from more powerful telescopes pose enormous challenges to state-ofthe- art database systems and data-loading techniques. In this paper we present SkyLoader, our novel framework for data loading that is being used to populate a multi-table, multi-terabyte database repository for the Palomar-Quest sky survey. SkyLoader consists of an efficient algorithm for bulk loading, an effective data structure to support data integrity, optimized parallelism, and guidelines for system tuning. Performance studies show the positive effects of these techniques, with load time for a 40-gigabyte data set reduced from over 20 hours to less than 3 hours. Our framework offers a promising approach for loading other large and complex scientific databases.