Mining biomolecular data using background knowledge and artificial neural networks

  • Authors:
  • Qicheng Ma;Jason T. L. Wang;James R. Gattiker

  • Affiliations:
  • Department of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ;Department of Computer and Information Science, New Jersey Institute of Technology, Newark, NJ;Los Alamos National Laboratory, Mail Stop E541, Los Alamos, NM

  • Venue:
  • Handbook of massive data sets
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Biomolecular data mining is the activity of finding significant information in protein, DNA and RNA molecules. The significant information may refer to motifs, clusters, genes, protein signatures and classification rules. This chapter presents an example of biomolecular data mining: the recognition of promoters in DNA. We propose a two-level ensemble of classifiers to recognize E. Coli promoter sequences. The first-level classifiers include three Bayesian neural networks that learn from three different feature sets. The outputs of the first-level classifiers are combined in the second level to give the final result. To enhance the recognition rate, we use the background knowledge (i.e., the characteristics of the promoter sequences) and employ new techniques to extract high-level features from the sequences. We also use an expectation-maximization (EM) algorithm to locate the binding sites of the promoter sequences. Empirical study shows that a precision rate of 95% is achieved, indicating an excellent performance of the proposed approach.