Assembling genomes on large-scale parallel computers

Authors:
Anantharaman Kalyanaraman;Scott J. Emrich;Patrick S. Sclmable;Srinivas Aluru
Affiliations:
Department of Electrical and Computer Engineering, Iowa State University, Ames, IA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA and Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA;Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA and Departments of Agronomy, and Genetics, Development and Cell Biology, Iowa State University, Ames, IA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA and Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 3
Cited 0

A strategy for assembling the maize (Zea mays L.) genome

Bioinformatics
Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series)

Handbook of Computational Molecular Biology (Chapman & All/Crc Computer and Information Science Series)
Space and time efficient parallel algorithms and software for EST clustering

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assembly of large genomes from tens of millions of short genomic fragments is computationally demanding requiring hundreds of gigabytes of memory and tens of thousands of CPU hours. New gene-enrichment sequencing strategies are expected to further exacerbate this situation. In this paper, we present a massively parallel genome assembly framework. The unique features of our approach include space-efficient and on-demand algorithms that consume only linear space, and heuristic strategies that reduce the number of expensive pairwise sequence alignments while maintaining assembly quality. As part of the ongoing efforts in maize genome seqencing, we applied our assembly framework to the largest available collection of maize genomic data. We report the partitioning of more than 1.6 million fragments of over 1.25 billion nucleotides total size into genomic islands in 2 hours on 1,024 processors of an IBM BlueGene/L supercomputer.