SDQuery DSI: integrating data management support with a wide area data transfer protocol

  • Authors:
  • Yu Su;Yi Wang;Gagan Agrawal;Rajkumar Kettimuthu

  • Affiliations:
  • The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The University of Chicago and Argonne National Laboratory, Argonne, IL

  • Venue:
  • SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many science areas where datasets need to be transferred or shared, rapid growth in dataset size, coupled with much slower increases in wide area data transfer bandwidths, is making it extremely hard for scientists to analyze the data. This paper addresses the current limitations by developing SDQuery DSI, a GridFTP plug-in that supports flexible server-side data subsetting. An existing GridFTP server is able to dynamically load this tool to support new functionality. Different queries types (query over dimensions, coordinates and values) are supported by our tool. A number of optimizations, like parallel indexing, performance model for data subsetting, and parallel streaming are also applied. We compare our SDQuery DSI with GridFTP default File DSI in different network environments, and show that our method can achieve better efficiency in almost all cases.