Genomes containing duplicates are hard to compare

  • Authors:
  • Cedric Chauve;Guillaume Fertin;Romeo Rizzi;Stéphane Vialette

  • Affiliations:
  • LaCIM, CGL, Département d'Informatique, Université du Québec À Montréal CP 8888, Montréal, QC, Canada;Laboratoire d'Informatique de Nantes-Atlantique (LINA), FRE CNRS 2729 Université de Nantes, Nantes Cedex 3, France;Dipartimento di Matematica e Informatica, Università di Udine, Italy;Laboratoire de Recherche en Informatique (LRI), UMR CNRS 8623, Faculté des Sciences d'Orsay, Université Paris-Sud, Orsay, France

  • Venue:
  • ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we are interested in the algorithmic complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes. In that case, there are usually two main ways to compute a given (dis)similarity measure M between two genomes G1 and G2: the first model, that we will call the matching model, consists in computing a one-to-one correspondence between genes of G1 and genes of G2, in such a way that M is optimized in the resulting permutation. The second model, called the exemplar model, consists in keeping in G1 (resp. G2) exactly one copy of each gene, thus deleting all the other copies, in such a way that M is optimized in the resulting permutation. We present here different results concerning the algorithmic complexity of computing three different similarity measures (number of common intervals, MAD number and SAD number) in those two models, basically showing that the problem becomes NP-completeness for each of them as soon as genomes contain duplicates. In the case of MAD and SAD, we actually prove that, under both models, both MAD and SAD problems are APX-hard.