Finite automata for testing composition-based reconstructibility of sequences
Journal of Computer and System Sciences
Detecting Repeat Families in Incompletely Sequenced Genomes
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Correcting short reads with high error rates for improved sequencing result
International Journal of Bioinformatics Research and Applications
Brief Communication: Whole genome assembly from 454 sequencing output via modified DNA graph concept
Computational Biology and Chemistry
Ab initio whole genome shotgun assembly with mated short reads
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
PSAEC: an improved algorithm for short read error correction using partial suffix arrays
FAW-AAIM'11 Proceedings of the 5th joint international frontiers in algorithmics, and 7th international conference on Algorithmic aspects in information and management
An efficient hybrid approach to correcting errors in short reads
MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
IDBA: a practical iterative de bruijn graph de novo assembler
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Deciding unique decodability of bigram counts via finite automata
Journal of Computer and System Sciences
Benchmark datasets for the DNA fragment assembly problem
International Journal of Bio-Inspired Computation
Hi-index | 3.84 |
Motivation: Current DNA sequencing technology produces reads of about 500--750 bp, with typical coverage under 10×. New sequencing technologies are emerging that produce shorter reads (length 80--200 bp) but allow one to generate significantly higher coverage (30× and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. Results: We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. Availability: Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html