Logic based methods for SNPs tagging and reconstruction

Authors:
Paola Bertolazzi;Giovanni Felici;Paola Festa
Affiliations:
Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" del CNR, Viale Manzoni 30, 00185 Rome, Italy;Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" del CNR, Viale Manzoni 30, 00185 Rome, Italy;Dipartimento di Matematica e Applicazioni "R. Caccioppoli", Universití degli Studi di Napoli FEDERICO II, Compl. MSA, Via Cintia, 80126 Napoli, Italy
Venue:
Computers and Operations Research
Year:
2010

Citing 9
Cited 0

Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A MINSAT Approach for Learning in Logic Domains

INFORMS Journal on Computing
Design of Logic-based Intelligent Systems

Design of Logic-based Intelligent Systems
HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms

Bioinformatics
Tag SNP selection in genotype data for maximizing SNP prediction accuracy

Bioinformatics
Logic classification and feature selection for biomedical data

Computers & Mathematics with Applications
Logical analysis of binary data with missing bits

Artificial Intelligence
A probabilistic heuristic for a computationally difficult set covering problem

Operations Research Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

SNPs are positions of the DNA sequences where the differences among individuals are embedded. The knowledge of such SNPs is crucial for disease association studies, but even if the number of such positions is low (about 1% of the entire sequence), the cost to extract the complete information is actually very high. Recent studies have shown that DNA sequences are structured into blocks of positions, that are conserved during evolution, where there is strong correlation among values (alleles) of different loci. To reduce the cost of extracting SNPs information, the block structure of the DNA has suggested to limit the process to a subset of SNPs, the so-called Tag SNPs, that are able to maintain the most of the information contained in the whole sequence. In this paper, we apply a technique for feature selection based on integer programming to the problem of Tag SNP selection. Moreover, to test the quality of our approach, we consider also the problem of SNPs reconstruction, i.e. the problem of deriving unknown SNPs from the value of Tag SNPs and propose two reconstruction methods, one based on a majority vote and the other on a machine learning approach. We test our algorithm on two public data sets of different nature, providing results that are, when comparable, in line with the related literature. One of the interesting aspects of the proposed method is to be found in its capability to deal simultaneously with very large SNPs sets, and, in addition, to provide highly informative reconstruction rules in the form of logic formulas.