Multi-marker tagging single nucleotide polymorphism selection using estimation of distribution algorithms

  • Authors:
  • Roberto Santana;Alexander Mendiburu;Noah Zaitlen;Eleazar Eskin;Jose A. Lozano

  • Affiliations:
  • Faculty of Informatics, Universidad Politécnica de Madrid, R. 3306, Campus de Montegancedo, 28660 Boadilla del Monte, Madrid, Spain;Intelligent Systems Group, University of the Basque Country, Paseo Manuel de Lardizábal 1, 20018 San Sebastian - Donostia, Spain;Computer Science and Human Genetics Group, University of California 1596, 3532-J Boelter Hall, Los Angeles, CA 90095-1596, USA;Computer Science and Human Genetics Group, University of California 1596, 3532-J Boelter Hall, Los Angeles, CA 90095-1596, USA;Intelligent Systems Group, University of the Basque Country, Paseo Manuel de Lardizábal 1, 20018 San Sebastian - Donostia, Spain

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objectives: This paper presents an optimization algorithm for the automatic selection of a minimal subset of tagging single nucleotide polymorphisms (SNPs). Methods and materials: The determination of the set of minimal tagging SNPs is approached as an optimization problem in which each tagged SNP can be covered by a single tagging SNP or by a pair of tagging SNPs. The problem is solved using an estimation of distribution algorithm (EDA) which takes advantage of the underlying topological structure defined by the SNP correlations to model the problem interactions. The EDA stochastically searches the constrained space of feasible solutions. It is evaluated across HapMap reference panel data sets. Results: The EDA was compared with a SAT solver, able to find the single-marker minimal tagging sets, and with the Tagger program. The percentage of reduction ranged from 10% to 43% in the number of tagging SNPs of the minimal multi-marker tagging set found by the EDA with respect to the other algorithms. Conclusions: The introduced algorithm is effective for the identification of minimal multi-marker SNP sets, which considerably reduce the dimension of the tagging SNP set in comparison with single-marker sets. Other variants of the SNP problem can be treated following the same approach.