Computational Biology and Chemistry
Assembly of Large Genomes from Paired Short Reads
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Ab initio whole genome shotgun assembly with mated short reads
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Algorithms for three versions of the shortest common superstring problem
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Localized genome assembly from reads to scaffolds: practical traversal of the paired string graph
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
Separating metagenomic short reads into genomes via clustering
WABI'11 Proceedings of the 11th international conference on Algorithms in bioinformatics
IDBA: a practical iterative de bruijn graph de novo assembler
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Parallel and memory-efficient reads indexing for genome assembly
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM Approach
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Hi-index | 3.84 |
Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake Contact: rwarren@bcgsc.ca