The fragment assembly string graph

Authors:
Eugene W. Myers
Affiliations:
Department of Computer Science, University of California Berkeley, CA, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 14

Aligning sequences by minimum description length

EURASIP Journal on Bioinformatics and Systems Biology
Detecting Repeat Families in Incompletely Sequenced Genomes

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
SpeedHap: An Accurate Heuristic for the Single Individual SNP Haplotyping Problem with Many Gaps, High Reading Error Rate and Low Coverage

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Assembly of Large Genomes from Paired Short Reads

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Ab initio whole genome shotgun assembly with mated short reads

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
An efficient algorithm for Chinese postman walk on bi-directed de Bruijn graphs

COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I
Hapsembler: an assembler for highly polymorphic genomes

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
IDBA: a practical iterative de bruijn graph de novo assembler

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Approximate all-pairs suffix/prefix overlaps

Information and Computation
Genome assembler for repetitive sequences

ITIB'12 Proceedings of the Third international conference on Information Technologies in Biomedicine
Computability of models for sequence assembly

WABI'07 Proceedings of the 7th international conference on Algorithms in Bioinformatics
FinIS: improved in silico finishing using an exact quadratic programming formulation

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Memory efficient minimum substring partitioning

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	3.84

Visualization

Abstract

We present a concept and formalism, the string graph, which represents all that is inferable about a DNA sequence from a collection of shotgun sequencing reads collected from it. We give time and space efficient algorithms for constructing a string graph given the collection of overlaps between the reads and, in particular, present a novel linear expected time algorithm for transitive reduction in this context. The result demonstrates that the decomposition of reads into kmers employed in the de Bruijn graph approach described earlier is not essential, and exposes its close connection to the unitig approach we developed at Celera. This paper is a preliminary piece giving the basic algorithm and results that demonstrate the efficiency and scalability of the method. These ideas are being used to build a next-generation whole genome assembler called BOA (Berkeley Open Assembler) that will easily scale to mammalian genomes. Contact: gene@eecs.berkeley.edu