Identifying SNPs without a reference genome by comparing raw reads

Authors:
Pierre Peterlongo;Nicolas Schnel;Nadia Pisanti;Marie-France Sagot;Vincent Lacroix
Affiliations:
INRIA Rennes, Bretagne Atlantique, EPI Symbiose, Rennes, France;INRIA Rennes, Bretagne Atlantique, EPI Symbiose, Rennes, France;Dipartimento di Informatica, Università di Pisa, Italy;INRIA Rhône-Alpes, Montbonnot Saint-Martin, France and Université de Lyon, Lyon, CNRS, UMR, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France;INRIA Rhône-Alpes, Montbonnot Saint-Martin, France and Université de Lyon, Lyon, CNRS, UMR, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne, France
Venue:
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Year:
2010

Citing 0
Cited 3

Space-efficient and exact de bruijn graph representation based on a bloom filter

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Efficient bubble enumeration in directed graphs

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Rime: Repeat identification

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Next generation sequencing (NGS) technologies are being applied to many fields of biology, notably to survey the polymorphism across individuals of a species. However, while single nucleotide polymorphisms (SNPs) are almost routinely identified in model organisms, the detection of SNPs in non model species remains very challenging due to the fact that almost all methods rely on the use of a reference genome. We address here the problem of identifying SNPs without a reference genome. For this, we propose an approach which compares two sets of raw reads. We show that a SNP corresponds to a recognisable pattern in the de Bruijn graph built from the reads, and we propose algorithms to identify these patterns, that we call mouths. We outline the potential of our method on real data. The method is tailored to short reads (typically Illumina), and works well even when the coverage is low where it reports few but highly confident SNPs. Our program, called kisSnp, can be downloaded here: http://alcovna.genouest.org/kissnp/.