XenoCluster: a grid computing approach to finding ancient evolutionary genetic anomalies

Authors:
Jesse D. Walters;Thomas L. Casavant;John P. Robinson;Thomas B. Bair;Terry A. Braun;Todd E. Scheetz
Affiliations:
Coordinated Laboratory for Computational Genomics, Iowa City, IA;Center for Bioinformatics and Computational Biology, Iowa City, IA;Coordinated Laboratory for Computational Genomics, Iowa City, IA;Coordinated Laboratory for Computational Genomics, Iowa City, IA;Center for Bioinformatics and Computational Biology, Iowa City, IA;Center for Bioinformatics and Computational Biology, Iowa City, IA
Venue:
PaCT'05 Proceedings of the 8th international conference on Parallel Computing Technologies
Year:
2005

Citing 1
Cited 2

TreeRank: a similarity measure for nearest neighbor searching in phylogenetic database

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management

Multi-granularity Parallel Computing in a Genome-Scale Molecular Evolution Application

PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
Validation of computational prediction of horizontal gene transfer events--XenoCluster

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes and evaluates a coarse-grained parallel computational approach to identifying rare evolutionary events often referred to as “horizontal gene transfers”. Unlike classical genetic evolution, in which variations in genes accumulate gradually within and among species, horizontal transfer events result in a set of potentially important genes which “jump” directly from the genetic material of one species to another. Such genes, known as xenologs, appear as anomalies when phylogenetic trees are compared for normal and xenologous genes from the same sets of species. However, this has not been previously possible due to a lack of data and computational capacity. With the availability of large numbers of computer clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, the possibility exists to examine “clusters” of genes using phylogenetic tree “similarity” as a distance metric. The full version of this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach. This paper describes such a solution and preliminary benchmarking results that show a reduction in total execution time from approximately two years to less than two weeks. Finally, we report on several trade-off issues in various partitions of the problem across WAN nodes, and LAN/WAN networks of tightly coupled computing clusters.