Artificial neural system for gene classification using a domain database

Authors:
Cathy Wu;G. M. Whitson
Affiliations:
Department of Mathematics and Computer Science, The University of Texas at Tyler, Tyler, Texas;Department of Mathematics and Computer Science, The University of Texas at Tyler, Tyler, Texas
Venue:
CSC '90 Proceedings of the 1990 ACM annual conference on Cooperation
Year:
1990

Citing 2
Cited 4

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Nucleic Acid and Protein Sequence Analysis

Nucleic Acid and Protein Sequence Analysis

Protein classification using a neural network database system

ANNA '91 Proceedings of the conference on Analysis of neural network applications
Combining artificial neural networks and statistics for stock-market forecasting

CSC '93 Proceedings of the 1993 ACM conference on Computer science
Neural networks for molecular sequence database management

CSC '91 Proceedings of the 19th annual conference on Computer Science
A bibliography on computational molecular biology and genetics

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database search for molecular sequence homologies is the most direct computational approach to decipher gene sequence for protein structure and function. However, the rapid accumulation of sequence data has made the search increasingly difficult using traditional algorithms and pattern-matching software. Our approach to this problem is to develop a domain artificial neural system (ANS) for gene classification by combining a new database design with the neural network theory. A new database, consisting only of identifiable protein domain classes, should be created to reduce the size of the molecular database to be searched. Each entry of the database should contain the domain consensus sequence and other domain features, both of which can be compiled from the NBRF-PIR protein sequence database. The domain database would then be embedded in a neural network so that the search problem is replaced by a pattern recognition problem. The domain ANS would be a three-layered network implemented with the back propagation learning algorithm. The inputs to the system are: the sequence string, which is mapped onto 400 units using a hashing function; and the sequence features, which are mapped onto 36 units. The outputs of the system are identification tags for each of the domain classes. A prototype domain ANS is being developed to map the consensus sequence and sequence features of each of the 41 domains in the training sets to its domain class. Once trained, the domain ANS would allow an easy, rapid gene classification for both intra-class and inter-class protein domains.