Fragment assembly with short reads

Authors:
Mark Chaisson;Pavel Pevzner;Haixu Tang
Affiliations:
Bioinformatics Program;Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA;Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 10

Finite automata for testing composition-based reconstructibility of sequences

Journal of Computer and System Sciences
Detecting Repeat Families in Incompletely Sequenced Genomes

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Correcting short reads with high error rates for improved sequencing result

International Journal of Bioinformatics Research and Applications
Brief Communication: Whole genome assembly from 454 sequencing output via modified DNA graph concept

Computational Biology and Chemistry
Ab initio whole genome shotgun assembly with mated short reads

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
PSAEC: an improved algorithm for short read error correction using partial suffix arrays

FAW-AAIM'11 Proceedings of the 5th joint international frontiers in algorithmics, and 7th international conference on Algorithmic aspects in information and management
An efficient hybrid approach to correcting errors in short reads

MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
IDBA: a practical iterative de bruijn graph de novo assembler

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Deciding unique decodability of bigram counts via finite automata

Journal of Computer and System Sciences
Benchmark datasets for the DNA fragment assembly problem

International Journal of Bio-Inspired Computation

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Current DNA sequencing technology produces reads of about 500--750 bp, with typical coverage under 10×. New sequencing technologies are emerging that produce shorter reads (length 80--200 bp) but allow one to generate significantly higher coverage (30× and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. Results: We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. Availability: Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html