Analysis of E.coli promoter recognition problem in dinucleotide feature space

  • Authors:
  • T. Sobha Rani;S. Durga Bhavani;Raju S. Bapi

  • Affiliations:
  • Computational Intelligence Lab, Department of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India;Computational Intelligence Lab, Department of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India;Computational Intelligence Lab, Department of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India

  • Venue:
  • Bioinformatics
  • Year:
  • 2007

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Patterns in the promoter sequences within a species are known to be conserved but there exist many exceptions to this rule which makes the promoter recognition a complex problem. Although many complex feature extraction schemes coupled with several classifiers have been proposed for promoter recognition in the current literature, the problem is still open. Results: A dinucleotide global feature extraction method is proposed for the recognition of sigma-70 promoters in Escherichia coli in this article. The positive data set consists of sigma-70 promoters with known transcription starting points which are part of regulonDB and promec databases. Four different kinds of negative data sets are considered, two of them biological sets (Gordon et al., 2003) and the other two synthetic data sets. Our results reveal that a single-layer perceptron using dinucleotide features is able to achieve an accuracy of 80% against a background of biological non-promoters and 96% for random data sets. A scheme for locating the promoter regions in a given genome sequence is proposed. A deeper analysis of the data set shows that there is a bifurcation of the data set into two distinct classes, a majority class and a minority class. Our results point out that majority class constituting the majority promoter and the majority non-promoter signal is linearly separable. Also the minority class is linearly separable. We further show that the feature extraction and classification methods proposed in the paper are generic enough to be applied to the more complex problem of eucaryotic promoter recognition. We present Drosophila promoter recognition as a case study. Availability: http://202.41.85.117/htmfiles/faculty/tsr/tsr.html Contact: tsrcs@uohyd.ernet.in