XenoCluster: a grid computing approach to finding ancient evolutionary genetic anomalies

  • Authors:
  • Jesse D. Walters;Thomas L. Casavant;John P. Robinson;Thomas B. Bair;Terry A. Braun;Todd E. Scheetz

  • Affiliations:
  • Coordinated Laboratory for Computational Genomics, Iowa City, IA;Center for Bioinformatics and Computational Biology, Iowa City, IA;Coordinated Laboratory for Computational Genomics, Iowa City, IA;Coordinated Laboratory for Computational Genomics, Iowa City, IA;Center for Bioinformatics and Computational Biology, Iowa City, IA;Center for Bioinformatics and Computational Biology, Iowa City, IA

  • Venue:
  • PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes and evaluates a coarse-grained parallel computational approach to identifying rare evolutionary events often referred to as “horizontal gene transfers”. Unlike classical genetic evolution, in which variations in genes accumulate gradually within and among species, horizontal transfer events result in a set of potentially important genes which “jump” directly from the genetic material of one species to another. Such genes, known as xenologs, appear as anomalies when phylogenetic trees are compared for normal and xenologous genes from the same sets of species. However, this has not been previously possible due to a lack of data and computational capacity. With the availability of large numbers of computer clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, the possibility exists to examine “clusters” of genes using phylogenetic tree “similarity” as a distance metric. The full version of this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach. This paper describes such a solution and preliminary benchmarking results that show a reduction in total execution time from approximately two years to less than two weeks. Finally, we report on several trade-off issues in various partitions of the problem across WAN nodes, and LAN/WAN networks of tightly coupled computing clusters.