Human Pol II promoter prediction by using nucleotide property composition features

Authors:
Wen-Lin Huang;Chun-Wei Tung;Shinn-Ying Ho
Affiliations:
Chin Min Institute of Technology, Miaoli, Taiwan;National Chiao Tung University, Hsinchu, Taiwan;National Chiao Tung University, Hsinchu, Taiwan
Venue:
ISB '10 Proceedings of the International Symposium on Biocomputing
Year:
2010

Citing 8
Cited 0

FMGA: Finding Motifs by Genetic Algorithm

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Synergy of human Pol II core promoter elements revealed by statistical sequence analysis

Bioinformatics
ARTS

Bioinformatics
PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm

Bioinformatics
ProSOM

Bioinformatics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Intelligent evolutionary algorithms for large parameter optimization problems

IEEE Transactions on Evolutionary Computation
Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1% and 95.1%, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9% and 91.1%) and M1 (97.4% and 93.6%) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.