Genetic sequence classification and its application to cross-species homology detection

  • Authors:
  • Snehasis Mukhopadhyay;Changhong Tang;Jeffery Huang;Mathew Palakal

  • Affiliations:
  • Department of Computer and Information Science, Indiana University, Purdue University Indianapolis, 723 W. Michigan St., SL280, Indianapolis, IN;Department of Computer and Information Science, Indiana University, Purdue University Indianapolis, 723 W. Michigan St., SL280, Indianapolis, IN;Department of Computer and Information Science, Indiana University, Purdue University Indianapolis, 723 W. Michigan St., SL280, Indianapolis, IN;Department of Computer and Information Science, Indiana University, Purdue University Indianapolis, 723 W. Michigan St., SL280, Indianapolis, IN

  • Venue:
  • Journal of VLSI Signal Processing Systems - Special issue on signal processing and neural networks for bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although large-scale classification studies of genetic sequence data are in progress around the world, very few studies compare different classification approaches, e.g. unsupervised and supervised, in terms of objective criteria such as classification accuracy and computational complexity. In this paper, we study such criteria for both unsupervised and supervised classification of a relatively large sequence data set. The unsupervised approach involves use of different sequence alignment algorithms (e.g., Smith-Waterman, FASTA and BLAST) followed by clustering using the Maximin algorithm. The supervised approach uses a suitable numeric encoding (relative frequencies of tuples of nucleotides followed by principal component analysis) which is fed to a Multi-layer Backpropagation Neural Network. Classification experiments conducted on IBM-SP parallel computers show that FASTA with unsupervised Maximin leads to best trade-off between accuracy and speed among all methods, followed by supervised neural networks as the second best approach. Finally, the different classifiers are applied to the problem of cross-species homology detection.