Assembling millions of short DNA sequences using SSAKE

Authors:
René L. Warren;Granger G. Sutton;Steven J. M. Jones;Robert A. Holt
Affiliations:
British Columbia Cancer Agency, Genome Sciences Centre 675 West 10th Avenue, Vancouver, BC V5Z 1L3, Canada;J. Craig Venter Institute, 9704 Medical Center Drive Rockville, MD 20850, USA;British Columbia Cancer Agency, Genome Sciences Centre 675 West 10th Avenue, Vancouver, BC V5Z 1L3, Canada;British Columbia Cancer Agency, Genome Sciences Centre 675 West 10th Avenue, Vancouver, BC V5Z 1L3, Canada
Venue:
Bioinformatics
Year:
2007

Citing 0
Cited 9

Review: Sequence assembly

Computational Biology and Chemistry
Assembly of Large Genomes from Paired Short Reads

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Ab initio whole genome shotgun assembly with mated short reads

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Algorithms for three versions of the shortest common superstring problem

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Separating metagenomic short reads into genomes via clustering

WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
IDBA: a practical iterative de bruijn graph de novo assembler

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Parallel and memory-efficient reads indexing for genome assembly

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM Approach

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake Contact: rwarren@bcgsc.ca