Human Pol II promoter prediction by using nucleotide property composition features

  • Authors:
  • Wen-Lin Huang;Chun-Wei Tung;Shinn-Ying Ho

  • Affiliations:
  • Chin Min Institute of Technology, Miaoli, Taiwan;National Chiao Tung University, Hsinchu, Taiwan;National Chiao Tung University, Hsinchu, Taiwan

  • Venue:
  • ISB '10 Proceedings of the International Symposium on Biocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1% and 95.1%, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9% and 91.1%) and M1 (97.4% and 93.6%) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.