Artificial neural system for gene classification using a domain database

  • Authors:
  • Cathy Wu;G. M. Whitson

  • Affiliations:
  • Department of Mathematics and Computer Science, The University of Texas at Tyler, Tyler, Texas;Department of Mathematics and Computer Science, The University of Texas at Tyler, Tyler, Texas

  • Venue:
  • CSC '90 Proceedings of the 1990 ACM annual conference on Cooperation
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database search for molecular sequence homologies is the most direct computational approach to decipher gene sequence for protein structure and function. However, the rapid accumulation of sequence data has made the search increasingly difficult using traditional algorithms and pattern-matching software. Our approach to this problem is to develop a domain artificial neural system (ANS) for gene classification by combining a new database design with the neural network theory. A new database, consisting only of identifiable protein domain classes, should be created to reduce the size of the molecular database to be searched. Each entry of the database should contain the domain consensus sequence and other domain features, both of which can be compiled from the NBRF-PIR protein sequence database. The domain database would then be embedded in a neural network so that the search problem is replaced by a pattern recognition problem. The domain ANS would be a three-layered network implemented with the back propagation learning algorithm. The inputs to the system are: the sequence string, which is mapped onto 400 units using a hashing function; and the sequence features, which are mapped onto 36 units. The outputs of the system are identification tags for each of the domain classes. A prototype domain ANS is being developed to map the consensus sequence and sequence features of each of the 41 domains in the training sets to its domain class. Once trained, the domain ANS would allow an easy, rapid gene classification for both intra-class and inter-class protein domains.