Semantic-based distributed i/o with the paramedic framework

  • Authors:
  • Pavan Balaji;Wuchun Feng;Heshan Lin

  • Affiliations:
  • Argonne National Laboratory, Argonne, IL, USA;Virginia Tech, Blacksburg, VA, USA;North Carolina State University, Raleigh, NC, USA

  • Venue:
  • HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called "ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing" which uses application-specific semantic information to convert the generated data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.