On the approximability of comparing genomes with duplicates

Authors:
Sébastien Angibaud;Guillaume Fertin;Irena Rusu
Affiliations:
Laboratoire d'Informatique de Nantes-Atlantique, FRE, CNRS, Université de Nantes, Nantes Cedex 3, France;Laboratoire d'Informatique de Nantes-Atlantique, FRE, CNRS, Université de Nantes, Nantes Cedex 3, France;Laboratoire d'Informatique de Nantes-Atlantique, FRE, CNRS, Université de Nantes, Nantes Cedex 3, France
Venue:
WALCOM'08 Proceedings of the 2nd international conference on Algorithms and computation
Year:
2008

Citing 10
Cited 4

Some APX-completeness results for cubic graphs

Theoretical Computer Science
Genomic distances under deletions and insertions

Theoretical Computer Science - Special papers from: COCOON 2003
Power boosts for cluster tests

RCG'05 Proceedings of the 2005 international conference on Comparative Genomics
The approximability of the exemplar breakpoint distance problem

AAIM'06 Proceedings of the Second international conference on Algorithmic Aspects in Information and Management
Approximating the 2-interval pattern problem

ESA'05 Proceedings of the 13th annual European conference on Algorithms
Conserved interval distance computation between non-trivial genomes

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Minimum common string partition problem: hardness and approximations

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Genomes containing duplicates are hard to compare

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Reversal distance for strings with duplicates: linear time approximation using hitting set

WAOA'06 Proceedings of the 4th international conference on Approximation and Online Algorithms
Non-breaking similarity of genomes with gene repetitions

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

The ExemplarBreakpointDistance for Non-trivial Genomes Cannot Be Approximated

WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
Comparing Bacterial Genomes by Searching Their Common Intervals

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Approximability and Fixed-Parameter Tractability for the Exemplar Genomic Distance Problems

TAMC '09 Proceedings of the 6th Annual Conference on Theory and Applications of Models of Computation
An Exact Algorithm for the Zero Exemplar Breakpoint Distance Problem

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals, SAD etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar model, maximum matching model and non maximum matching model). We prove that, for each model and each measure, computing a matching between two genomes that optimizes the measure is APX-Hard. We show that this result remains true even for two genomes G1 and G2 such that G1 contains no duplicates and no gene of G2 appears more than twice. Therefore, our results extend those of [5,6,8]. Finally, we propose a 4-approximation algorithm for a measure closely related to the number of breakpoints, the number of adjacencies, under the maximum matching model, in the case where genomes contain the same number of duplications of each gene.