Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Bioinformatics
The Sequence Alignment/Map format and SAMtools
Bioinformatics
Bioinformatics
The variant call format and VCFtools
Bioinformatics
Mapping personal functional data to personal genomes
Bioinformatics
Inferring ancestry in admixed populations using microarray probe intensities
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Read Annotation Pipeline for High-Throughput Sequencing Data
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Read Annotation Pipeline for High-Throughput Sequencing Data
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Hi-index | 0.00 |
Next generation sequencing techniques have enabled new methods of DNA and RNA quantification. Many of these methods require a step of aligning short reads to some reference genome. If the target organism differs significantly from this reference, alignment errors can lead to significant errors in downstream analysis. Various attempts have been tried to integrate known genetic variants into the reference genome so as to construct sample-specific genomes to improve read alignments. However, many hurdles in generating and annotating such genomes remain unsolved. In this paper, we propose a general framework for mapping back and forth between genomes. It employs a new format, MOD, to represent known variants between genomes, and a set of tools that facilitate genome manipulation and mapping. We demonstrate the utility of this framework using three inbred mouse strains. We built pseudogenomes from the mm9 mouse reference genome for three highly divergent mouse strains based on MOD files and used them to map the gene annotations to these new genomes. We observe that a large fraction of genes have their positions or ranges altered. Finally, using RNA-seq and DNA-seq short reads from these strains, we demonstrate that mapping to the new genomes yields a better alignment result than mapping to the standard reference. The MOD files for the 17 mouse strains sequenced in the Wellcome Trust Sanger Institute's Mouse Genomes Project can be found at http://www.csbio.unc.edu/CCstatus/index.py?run=Pseudo The auxiliary tools (i.e. MODtools and Lapels), written in Python, are available at http://code.google.com/p/modtools/ and http://code.google.com/p/lapels/.