High-performance remote access to climate simulation data: a challenge problem for data grid technologies

  • Authors:
  • Ann Chervenak;Ewa Deelman;Carl Kesselman;Bill Allcock;Ian Foster;Veronika Nefedova;Jason Lee;Alex Sim;Arie Shoshani;Bob Drach;Dean Williams;Don Middleton

  • Affiliations:
  • Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA;Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA;Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Suite 1001, Marina del Rey, CA;Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL;Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL;Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, IL;Lawrence Berkeley National Laboratory, 1 Cyclotron Road MS-65, Berkeley, CA;Lawrence Berkeley National Laboratory, 1 Cyclotron Road MS-65, Berkeley, CA;Lawrence Berkeley National Laboratory, 1 Cyclotron Road MS-65, Berkeley, CA;Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA;Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA;National Center for Atmospheric Research, 1850 Table Mesa Drive, Boulder, CO

  • Venue:
  • Parallel Computing - Special issue: High performance computing with geographical data
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In numerous scientific disciplines, terabyte and petabyte-scale data collections are emerging as critical community resources. A new class of "data grid" infrastructure is required to support management, transport, distributed access to, and analysis of these datasets by potentially thousands of users. Researchers who face this challenge include the climate modeling community, which performs long-duration computations accompanied by frequent output of very large files that must be further analyzed. We describe the Earth System Grid-I prototype, which brings together advanced analysis, replica management, data transfer, request management, and other technologies to support high-performance, interactive analysis of replicated data. We present performance results that demonstrate our ability to manage the location and movement of large datasets from the user's desktop. We report on experiments conducted over SciNET at SC'2000, where we achieved peak performance of 1.55 Gb/s and sustained performance of 512.9 Mb/s for data transfers between Texas and California. Finally, we describe the development of the next-generation Earth System Grid-II (ESG-II) project. Important issues for ESG-II include security requirements for production environments, efficient data filtering and transport, metadata services for discovery of relevant climate datasets, and sophisticated request or workflow management for complex tasks.