Mining biomolecular data using background knowledge and artificial neural networks

Authors:
Qicheng Ma;Jason T. L. Wang;James R. Gattiker
Affiliations:
Department of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ;Department of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ;Los Alamos National Laboratory, Mail Stop E541, Los Alamos, NM
Venue:
Handbook of massive data sets
Year:
2002

Citing 12
Cited 2

The human genome project and informatics

Communications of the ACM
Bayesian interpolation

Neural Computation
A practical Bayesian framework for backpropagation networks

Neural Computation
Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition

Machine Learning - Special issue on applications in molecular biology
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications

Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
Machine Learning Approaches to Gene Recognition

IEEE Expert: Intelligent Systems and Their Applications
Characterization of Prokaryotic and Eukaryotic Promoters Using Hidden Markov Models

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Discovering Patterns and Subfamilies in Biosequences

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Connectionist theory refinement: genetically searching the space of network topologies

Journal of Artificial Intelligence Research

Incorporating Metadata into Data Mining with Ontology

IEICE - Transactions on Information and Systems
E-Coli promoter recognition using neural networks with feature selection

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomolecular data mining is the activity of finding significant information in protein, DNA and RNA molecules. The significant information may refer to motifs, clusters, genes, protein signatures and classification rules. This chapter presents an example of biomolecular data mining: the recognition of promoters in DNA. We propose a two-level ensemble of classifiers to recognize E. Coli promoter sequences. The first-level classifiers include three Bayesian neural networks that learn from three different feature sets. The outputs of the first-level classifiers are combined in the second level to give the final result. To enhance the recognition rate, we use the background knowledge (i.e., the characteristics of the promoter sequences) and employ new techniques to extract high-level features from the sequences. We also use an expectation-maximization (EM) algorithm to locate the binding sites of the promoter sequences. Empirical study shows that a precision rate of 95% is achieved, indicating an excellent performance of the proposed approach.