FMGA: Finding Motifs by Genetic Algorithm
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Bioinformatics
Bioinformatics
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Intelligent evolutionary algorithms for large parameter optimization problems
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
RNA polymerase II (Pol II) promoter is a key region that regulates differential transcription of protein coding genes. The identification of the RNA polymerase II (Pol II) promoter is one of the most challenging problems in genome annotation. Though many promoter prediction methods and tools have been developed, they have not yet extracted informative features from large-scale DNA sequences to improve predictive accuracy. A prediction method ProPolyII, which involves mining informative nucleotide property composition (NPC) features, is proposed to design a support vector machine-based classifier. An existing data set HumP (1872 human promoters and 1870 non-promoters) is used to evaluate ProPolyII for promoter prediction. ProPolyII yields 70 informative NPC features with training and test accuracies of 99.1% and 95.1%, respectively. The 70 NPC features consist of 46 4-mer motifs, 3 nucleotide properties and 21 global descriptors. The accuracies are better than those of Prom-Machine (94.9% and 91.1%) and M1 (97.4% and 93.6%) which uses top 128 4-mer motifs and 36 global descriptors, respectively. The high predictive performance indicates that ProPolyII can be beneficial in the identification of promoters comparative to other methods.