Application re-strusturing and data management on a GRID enviroment: a case study for bioinformatics

  • Authors:
  • Giovanni Ciriello;Matteo Comin;Concettina Guerra

  • Affiliations:
  • University of Padova, Dept. of Information Engineering, Padova;University of Padova, Dept. of Information Engineering, Padova;University of Padova, Dept. of Information Engineering, Padova, and Georgia Institute of Technology, College of Computing, Atlanta, GA

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper describes a distributed implementation of PROuST, a method for protein structure comparison, that involves a major restructuring of the application for an efficient grid immersion. PROuST consists of several components that perform different tasks at different stages. Given a target protein, an index-based search retrieves from a database a list of proteins that are good candidates for similarity, then a dynamic programming algorithm aligns the target protein with each candidate protein. The same geometric properties of secondary structure elements of proteins are used by different components of PROuST. Thus, an important issue of the distributed implementation is data transfer vs. data recomputation tradeoffs. Our implementation avoids recomputation by re-using the hash table data as much as possible, once they are accessed. The algorithmic changes to the application allow to reduce the number of data accesses to storage elements and consequently the execution time. In addition this paper discusses data replication strategies on a grid environment to optimize the data transfer time.