Contig selection in physical mapping

  • Authors:
  • Steffan Heber;Jens Stoye;Jörg Hoheisel;Martin Vingron

  • Affiliations:
  • German Cancer Research Center (DKFZ), Functional Genome Analysis (H0800), Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany;German Cancer Research Center (DKFZ), Theoretical Bioinformatics (H0300), Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany;German Cancer Research Center (DKFZ), Functional Genome Analysis (H0800), Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany;German Cancer Research Center (DKFZ), Theoretical Bioinformatics (H0300), Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany

  • Venue:
  • RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In physical mapping one orders a set of genetic landmarks or a library of cloned fragments of DNA according to their position in the genome. This is a preparatory step for efficient sequencing.Our approach to physical mapping divides the problem into smaller and easier subproblems by partitioning the probe set into independent parts (contigs). The focus is on the selection of probe sets which can be grouped together into contigs. We introduce a new distance function between probes, the averaged rank distance (ARD). The ARD measures the reliability of certain probe configurations in physical maps which are generated by bootstrap resampling of the raw data. This mimics an independent experiment repetition in silico. The ARD measures the distances of probes within a contig and smoothes the distances of probes in different contigs. It shows distinct jumps at contig borders. This makes it appropriate for contig selection by clustering. We designed a physical mapping algorithm that makes use of these observations and seems to be particularly well suited to the delineation of reliable contigs.We evaluated our method on data sets from two physical mapping projects. In comparison to a physical map of Pasteurella haemolytica that was computed using simulated annealing, the newly computed map is considerably cleaner. On data from Xylella fastidiosa the contigs produced by the new method could be compared to a map produced by a group of experts and the two maps largely agree in the definition of the contigs. The results of our method have already proven helpful for the design of experiments aiming at further improving the quality of a map.