Creating regular expressions as mRNA motifs with GP to predict human exon splitting

  • Authors:
  • Wiliam B. Langdon;J. Rowsell;A. P. Harrison

  • Affiliations:
  • King's College, London, London, Bahamas;Essex University, CO4 3SQ, Gt Britain;Essex University, CO4 3SQ, AA, Azerbaijani

  • Venue:
  • Proceedings of the 11th Annual conference on Genetic and evolutionary computation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ allows the user to calculate correlations of gene expression, both between genes and between components within genes. We investigate all of Ensembl http://www.ensembl.org and find all the Homo Sapiens exons for which there are sufficient robust Affymetrix HG-U133 Plus 2 GeneChip probes. Calculating correlation between mRNA probe measurements for the same exon shows many exons whose components are consistently up regulated and down regulated. However we identify other Ensembl exons where sub-regions within them are self consistent but these transcript blocks are not well correlated with other blocks in the same exon. We suggest many current Ensembl exon definitions are incomplete. Secondly, having identified exon with substructure we use machine learning to try and identify patterns in the DNA sequence lying between blocks of high correlation which might yield biological or technological explanations. A Backus-Naur form (BNF) context-free grammar constrains strongly typed genetic programming (STGP) to evolve biological motifs in the form of regular expressions (RE) (e.g. TCTTT) which classify gene exons with potential alternative mRNA expression from those without. We show biological patterns can be data mined by a GP written in gawk and using egrep from NCBI's GEO http://www.ncbi.nlm.nih.gov/geo/ database. The automatically produced DNA motifs suggest that alternative polyadenylation is not responsible. (Full version in TR-09-02 [7].) Blocky exons can be found in http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz