Joint base-calling of two DNA sequences with factor graphs

  • Authors:
  • Xiaomeng Shi;Desmond S. Lun;Muriel Médard;Ralf Kötter;James C. Meldrim;Andrew J. Barry

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Phenomics and Bioinformatics Research Centre, The School of Mathematics and Statistics, and The Australian Centre for Plant Functional Genomics, University of South Australia, Mawson Lakes, SA, Au ...;Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Institute for Communications Engineering, Technical University of Munich, Munich, Germany;DNA Sequencing Operations Group, Broad Institute of MIT and Harvard, Cambridge, MA;DNA Sequencing Operations Group, Broad Institute of MIT and Harvard, Cambridge, MA

  • Venue:
  • IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automated estimation of DNA base-sequences is an important step in genomics and in many other emerging fields in biological and medical sciences. Current automated sequencers process single strands only. To improve the utility of existing technologies, we propose to mix two independent strands prior to electrophoresis, and base-call jointly by applying the sum-product algorithm on factor graphs. We first present a statistical model for DNA sequencing data and examine the model parameters. A practical heuristic is then proposed to estimate the peaks, which are then separated into two source sequences (Major/Minor) by passing messages on a factor graph. Simulation results show that joint base-calling can provide less accurate but valid results for the minor. The algorithm presented provides a basis for future investigation of joint sequencing techniques.