A new method to forecast of Escherichia coli promoter gene sequences: Integrating feature selection and Fuzzy-AIRS classifier system

  • Authors:
  • Kemal Polat;Salih Güneş

  • Affiliations:
  • Selcuk University, Department of Electrical and Electronics Engineering, 42075 Konya, Turkey;Selcuk University, Department of Electrical and Electronics Engineering, 42075 Konya, Turkey

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.06

Visualization

Abstract

We have investigated the real-world task of recognizing biological concepts in DNA sequences in this work. Recognizing promoters in strings that represent nucleotides (one of A, G, T, or C) has been performed using a novel approach based on feature selection (FS) and Artificial Immune Recognition System (AIRS) with Fuzzy resource allocation mechanism (Fuzzy-AIRS), which is first proposed by us. The aim of this study is to improve the prediction accuracy of Escherichia coli promoter gene sequences using a novel system based on FS and Fuzzy-AIRS. The E. coli promoter gene sequences dataset has 57 attributes and 106 samples including 53 promoters and 53 non-promoters. The proposed system consists of two parts. Firstly, we have reduced the dimension of E. coli promoter gene sequences dataset from 57 attributes to 4 attributes by means of FS process. Second, Fuzzy-AIRS classifier algorithm has been run to predict the E. coli promoter gene sequences. The robustness of the proposed method is examined using prediction accuracy, sensitivity and specificity analysis, k-fold cross-validation method and confusion matrix. Whilst only Fuzzy-AIRS classifier has obtained 50% prediction accuracy using 10-fold cross-validation, the proposed system has obtained 90% prediction accuracy in the same conditions. These obtained results have indicated that the proposed system obtain the success rate in recognizing promoters in strings that represent nucleotides.