Grid Deployment of Legacy Bioinformatics Applications with Transparent Data Access

  • Authors:
  • Christophe Blanchet;Remi Mollon;Douglas Thain;Gilbert Deleage

  • Affiliations:
  • Institut de Biologie et Chimie des Prot?es, IBCP UMR 5086/ CNRS/ Univ. Lyon1/ IFR128 BioSciences Lyon-Gerland/ 7, passage du Vercors, 69367 Lyon cedex 07, France. Christophe.Blanch;Institut de Biologie et Chimie des Prot?es, IBCP UMR 5086/ CNRS/ Univ. Lyon1/ IFR128 BioSciences Lyon-Gerland/ 7, passage du Vercors, 69367 Lyon cedex 07, France. dthain@cse.nd.edu;Department of Computer Science and Engineering, University of Notre Dame, 384 Fitzpatrick Hall, Notre Dame, Indiana, United States. Remi.Mollon@ibcp.fr;Institut de Biologie et Chimie des Prot?es, IBCP UMR 5086/ CNRS/ Univ. Lyon1/ IFR128 BioSciences Lyon-Gerland/ 7, passage du Vercors, 69367 Lyon cedex 07, France. Gilbert.Deleage@i

  • Venue:
  • GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although grid computing offers great potential for executing large-scale bioinformatics applications, practical deployment is constrained by legacy interfaces. Most widely deployed bioinformatics were designed long before grid computing arose, and thus are created, tested, and validated in the familiar environment of a workstation. Most perform simple local I/O and have no facility for interfacing with a distributed system. Because of these limitations, users of bioinformatics applications are generally constrained to creating large local clustered systems in order to perform data analysis. In order to deploy these applications in wide-area grid systems, users require a transparent mechanism of attaching legacy interfaces to grid I/O systems. We have explored this problem by deploying several bioinformatics databases and programs for protein sequence analysis on the European EGEE grid. Using tools for transparent adaptation, we have connected legacy applications to the logical namespace provided by a replica manager, and compared the performance of remote access versus file staging. For common bioinformatics applications, we find that remote access has performance equal or better than simple file staging, with the added advantage that users are freed from stating the data needs of applications in advance.