The human genome project and informatics
Communications of the ACM
Neural Computation
A practical Bayesian framework for backpropagation networks
Neural Computation
Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Machine Learning - Special issue on applications in molecular biology
Bayesian Learning for Neural Networks
Bayesian Learning for Neural Networks
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Machine Learning Approaches to Gene Recognition
IEEE Expert: Intelligent Systems and Their Applications
Characterization of Prokaryotic and Eukaryotic Promoters Using Hidden Markov Models
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Discovering Patterns and Subfamilies in Biosequences
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA
Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Connectionist theory refinement: genetically searching the space of network topologies
Journal of Artificial Intelligence Research
Incorporating Metadata into Data Mining with Ontology
IEICE - Transactions on Information and Systems
E-Coli promoter recognition using neural networks with feature selection
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part II
Hi-index | 0.00 |
Biomolecular data mining is the activity of finding significant information in protein, DNA and RNA molecules. The significant information may refer to motifs, clusters, genes, protein signatures and classification rules. This chapter presents an example of biomolecular data mining: the recognition of promoters in DNA. We propose a two-level ensemble of classifiers to recognize E. Coli promoter sequences. The first-level classifiers include three Bayesian neural networks that learn from three different feature sets. The outputs of the first-level classifiers are combined in the second level to give the final result. To enhance the recognition rate, we use the background knowledge (i.e., the characteristics of the promoter sequences) and employ new techniques to extract high-level features from the sequences. We also use an expectation-maximization (EM) algorithm to locate the binding sites of the promoter sequences. Empirical study shows that a precision rate of 95% is achieved, indicating an excellent performance of the proposed approach.