Improving sequence alignment based gene functional annotation with natural language processing and associative clustering

  • Authors:
  • Ji He

  • Affiliations:
  • Department of Scientific Computing, The Samuel Roberts Noble Foundation, Ardmore, OK

  • Venue:
  • ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequence alignment has been a commonly adopted technique for annotating gene functions Biologists typically infer the function of a unknown query gene according to the function of the reference subject gene that shows the highest homology (commonly referred to as the “top hit”) BLAST search against the NCBI NR database has been the de facto “golden companion” in many applications However, the NR database is known as noisy and contains significant sequence redundancy, which leads to various complications in the annotation process This paper proposes an integrative approach that encompasses natural language processing (NLP) for feature representation of functional descriptions and a novel artificial neural network customized based on the Adaptive Resonance Associative Map (ARAM) for clustering of subject genes and for reducing their redundancy The proposed approach was evaluated in a model legume species Medicago truncatula and was shown highly effective in our experiments.