Evolving Regular Expressions for GeneChip Probe Performance Prediction

  • Authors:
  • William B. Langdon;Andrew P. Harrison

  • Affiliations:
  • Departments of Mathematical, Biological Sciences and Computing and Electronic Systems, University of Essex, UK;Departments of Mathematical, Biological Sciences and Computing and Electronic Systems, University of Essex, UK

  • Venue:
  • Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. We use correlations between measurements for the same gene across 6685 human tissue samples from NCBI's GEO database to indicated the quality of individual HG-U133A probes. Low concordance indicates a poor probe. Regular expressions can be data mined by a Backus-Naur form (BNF) context-free grammar using strongly typed genetic programming written in gawkand using egrep. The automatically produced motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided.