A parallel implementation for determining genomic distances under deletion and insertion

  • Authors:
  • Vijaya Smitha Kolli;Hui Liu;Michelle Hong Pan;Yi Pan

  • Affiliations:
  • Department of Computer Science, Georgia State University, Atlanta, GA;Department of Computer Science, Georgia State University, Atlanta, GA;Career Development Division, Public Health Informatics Fellow Program, Centers for Disease Control and Prevention, Office of Workforce and Career Development, Atlanta, GA;Department of Computer Science, Georgia State University, Atlanta, GA

  • Venue:
  • ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the need for comparing genomes of different species has grown dramatically with the fast progress of the Human Genome Project, the evolution at the level of whole genomes has attracted more and more attention from both biologists and computer scientists. They are especially interested in the scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. Marron et al proposed a polynomial-time approximation algorithm to compute (near) minimum edit distances under inversions, deletions, and unrestricted insertions. Our work is based on their algorithm, which carries out lots of comparisons and sorting to calculate the edit distance. These comparisons and sorting are extremely time-consuming, and they result in decrease of computational efficiency. We believe the time of the algorithm can be improved through parallelization. We parallelize their algorithm via OpenMP using Intel C++ compiler for Linux 7.1, and compare three levels of parallelism: coarse grain, fine grain and combination of both. The experiments are conducted for a varying number of threads and different lengths of the gene sequences. The experimental results show that either coarse grain parallelism or fine grain parallelism alone does not improve the performance of the algorithm very much. However, the use of combination of both fine grain and coarse grain parallelism improves the performance of the algorithm drastically.