Domain architecture in homolog identification

  • Authors:
  • N. Song;R. D. Sedgewick;D. Durand

  • Affiliations:
  • Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA;Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA;Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • RCG'06 Proceedings of the RECOMB 2006 international conference on Comparative Genomics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the alignment of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of domain architecture. We developed several schemes for scoring the similarity of a pair of protein sequences by exploiting an analogy between comparing proteins using their domain content and comparing documents based on their word content. We evaluate the proposed methods using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting critical domains and of compensating for proteins with large numbers of domains.