BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction

  • Authors:
  • Gilles Fedak;Haiwu He;Franck Cappello

  • Affiliations:
  • INRIA Saclay, Grand-Large, Orsay, F-91893, France and LRI, Univ Paris-Sud, CNRS, Orsay, F-91405, France;INRIA Saclay, Grand-Large, Orsay, F-91893, France and LRI, Univ Paris-Sud, CNRS, Orsay, F-91405, France;INRIA Saclay, Grand-Large, Orsay, F-91893, France and LRI, Univ Paris-Sud, CNRS, Orsay, F-91405, France

  • Venue:
  • Journal of Network and Computer Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Desktop Grids use the computing, network and storage resources from idle desktop PCs distributed over multiple-LANs or the Internet to compute a large variety of resource-demanding distributed applications. While these applications need to access, compute, store and circulate large volumes of data, little attention has been paid to data management in such large-scale, dynamic, heterogeneous, volatile and highly distributed Grids. In most cases, data management relies on ad hoc solutions, and providing a general approach is still a challenging issue. A new class of data management service is desirable to deal with such a variety of file transfer protocols than client/server, P2P or the new and emerging Cloud storage service. To address this problem, we propose the BitDew framework, a programmable environment for automatic and transparent data management on computational Desktop Grids. This paper describes the BitDew programming interface, its architecture, and the performance evaluation of its runtime components. BitDew relies on a specific set of metadata to drive key data management operations, namely life cycle, distribution, placement, replication and fault tolerance with a high level of abstraction. The BitDew runtime environment is a flexible distributed service architecture that integrates modular P2P components such as DHTs (Distributed Hash Tables) for a Distributed Data Catalog and collaborative transport protocols for data distribution. We explain how to plug-in new or existing protocols and we give evidence of the versatility of the framework by implementing HTTP, FTP and BitTorrent protocols and access to the Amazon S3 and IBP Wide Area Storage. We describe the mechanisms used to provide asynchronous and reliable multi-protocols transfers. Through several examples, we describe how application programmers and BitDew users can exploit BitDew's features. We report on performance evaluation using micro-benchmarks, various usage scenarios and data-intense bioinformatics application, both in the Grid context and on the Internet. The performance evaluation demonstrates that the high level of abstraction and transparency is obtained with a reasonable overhead, while offering the benefit of scalability, performance and fault tolerance with little programming cost.