New Challenges in Petascale Scientific Databases

  • Authors:
  • Alexander Szalay

  • Affiliations:
  • Department of Physics and Astronomy, The Johns Hopkins University, Baltimore MD 21218

  • Venue:
  • SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific data is doubling every year. Virtual Observatories are established over every scale of the physical world: from elementary particles to materials, biological systems, environmental observatories, remote sensing, and the universe. These collaborations collect increasing amounts of data, often close to a rate of petabytes per year. Many scientists will soon obtain most of their data from large scientific repositories of data, often stored in the form of databases. The talk will discuss the different requirements for such databases, and discuss user behavior in a few concrete examples taken from astronomy, in particular from the 6 year usage of the Sloan Digital Sky Survey database. Interesting query patterns are emerging, where users create custom "crawlers" to break large queries into many repetitive ones. The trial-and-error behavior of many exploratory projects will be also discussed. The talk will also present various scalable alternatives to large scientific analysis facilities.