Transforming Genomes Using MOD Files with Applications

Authors:
Shunping Huang;Chia-Yu Kao;Leonard McMillan;Wei Wang
Affiliations:
Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA;Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA;Dept. of Computer Science, University of North Carolina, Chapel Hill, NC 27599, USA;Dept. of Computer Science, University of California, Los Angeles, CA 90095, USA
Venue:
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Year:
2013

Citing 10
Cited 1

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
TopHat

Bioinformatics
The Sequence Alignment/Map format and SAMtools

Bioinformatics
Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data

Bioinformatics
Efficient genome ancestry inference in complex pedigrees with inbreeding

Bioinformatics
Tabix

Bioinformatics
The variant call format and VCFtools

Bioinformatics
Mapping personal functional data to personal genomes

Bioinformatics
Inferring ancestry in admixed populations using microarray probe intensities

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Read Annotation Pipeline for High-Throughput Sequencing Data

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Read Annotation Pipeline for High-Throughput Sequencing Data

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Next generation sequencing techniques have enabled new methods of DNA and RNA quantification. Many of these methods require a step of aligning short reads to some reference genome. If the target organism differs significantly from this reference, alignment errors can lead to significant errors in downstream analysis. Various attempts have been tried to integrate known genetic variants into the reference genome so as to construct sample-specific genomes to improve read alignments. However, many hurdles in generating and annotating such genomes remain unsolved. In this paper, we propose a general framework for mapping back and forth between genomes. It employs a new format, MOD, to represent known variants between genomes, and a set of tools that facilitate genome manipulation and mapping. We demonstrate the utility of this framework using three inbred mouse strains. We built pseudogenomes from the mm9 mouse reference genome for three highly divergent mouse strains based on MOD files and used them to map the gene annotations to these new genomes. We observe that a large fraction of genes have their positions or ranges altered. Finally, using RNA-seq and DNA-seq short reads from these strains, we demonstrate that mapping to the new genomes yields a better alignment result than mapping to the standard reference. The MOD files for the 17 mouse strains sequenced in the Wellcome Trust Sanger Institute's Mouse Genomes Project can be found at http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo The auxiliary tools (i.e. MODtools and Lapels), written in Python, are available at http://code.google.com/p/modtools/ and http://code.google.com/p/lapels/.