Genetic algorithm and optimized weight matrix application for peroxisome proliferator response elements recognition: Prerequisites of accuracy growth for wide genome research

  • Authors:
  • Victor Levitsky;Elena Ignatieva;Eugenia Aman;Tatyana Merkulova;Nikolay Kolchanov;Charles Hodgman

  • Affiliations:
  • (Correspd. Tel.: +7 383 3333119/ Fax: +7 383 3331278/ E-mail: levitsky@bionet.nsc.ru) Lab. of Theoretical Genetics, Inst. of Cytology and Genetics, Novosibirsk and Dept. of Natural Science, Novosi ...;Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia and Department of Natural Science, Novosibirsk State University, Novosibirsk, 630090, Russia;Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia;Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia and Department of Natural Science, Novosibirsk State University, Novosibirsk, 630090, Russia;Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia and Department of Natural Science, Novosibirsk State University, Novosibirsk, 630090, Russia;Multidisciplinary Centre for Integrative Biology, School of Biosciences, University of Nottingham, Sutton Bonington, LE12 5RD, UK

  • Venue:
  • Intelligent Data Analysis - New Methods in Bioinformatics Presented at the Fifth International Conference on Bioinformatics of Genome Regulation and Structure
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Development of reliable transcription factor binding site (TFBS) recognition methods is an important step in the large-scale genome analysis. The most of currently applied methods to predict functional TFBSs are hampered by the high false-positive rates that occur when too few functionally characterised sequences are available and only sequence conservation within a site core is considered. We propose two methods to search for binding sites (BSs) of peroxisome proliferator-activated receptor (PPAR) (peroxisome proliferator response elements, PPREs). The first method is the optimized dinucleotide position weight matrix (PWM) model, the second method represented by SiteGA model that used genetic algorithm with a discriminant function of locally positioned dinucleotides to infer the most important positions and dinucleotides. We used in our analysis two PPRE datasets, consisting of 37 and 98 BSs, correspondingly. We showed that dataset extension improved the accuracy of SiteGA, but not PWM model. Finally we combined both models (PWM and SiteGA) to the dataset of annotated human promoters (EPD). We demonstrated that the larger dataset and the longer window length supported notable growth of accuracies for PWM and SiteGA models. Consequently, a combined PWM and SiteGA application may better restrict the number of potential targets in the EPD promoter dataset.