OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: introduction and first achievements

  • Authors:
  • Christophe Dessimoz;Gina Cannarozzi;Manuel Gil;Daniel Margadant;Alexander Roth;Adrian Schneider;Gaston H. Gonnet

  • Affiliations:
  • Institute of Computational Science, ETH Zurich, Zürich;Institute of Computational Science, ETH Zurich, Zürich;Institute of Computational Science, ETH Zurich, Zürich;Institute of Computational Science, ETH Zurich, Zürich;Institute of Computational Science, ETH Zurich, Zürich;Institute of Computational Science, ETH Zurich, Zürich;Institute of Computational Science, ETH Zurich, Zürich

  • Venue:
  • RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The OMA project is a large-scale effort to identify groups of orthologs from complete genome data, currently 150 species. The algorithm relies solely on protein sequence information and does not require any human supervision. It has several original features, in particular a verification step that detects paralogs and prevents them from being clustered together. Consistency checks and verification are performed throughout the process. The resulting groups, whenever a comparison could be made, are highly consistent both with EC assignments, and with assignments from the manually curated database HAMAP. A highly accurate set of orthologous sequences constitutes the basis for several other investigations, including phylogenetic analysis and protein classification.