Approximation of RNA multiple structural alignment

  • Authors:
  • Marcin Kubica;Romeo Rizzi;Stéphane Vialette;Tomasz Waleń

  • Affiliations:
  • Institute of Informatics, Warsaw University, Warszawa, Poland;Dipartimento di Matematica ed Informatica (DIMI), Università di Udine, Udine, Italy;Laboratoire de Recherche en Informatique (LRI), UMR CNRS 8623, Faculté des Sciences d'Orsay, Université Paris-Sud, Orsay, France;Institute of Informatics, Warsaw University, Warszawa, Poland

  • Venue:
  • CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

In the context of non-coding RNA (ncRNA) multiple structural alignment, Davydov and Batzoglou introduced in [7] the problem of finding the largest nested linear graph that occurs in a set ${\mathcal{G}}$ of linear graphs, the so-called Max-NLS problem. This problem generalizes both the longest common subsequence problem and the maximum common homeomorphic subtree problem for rooted ordered trees. In the present paper, we give a fast algorithm for finding the largest nested linear subgraph of a linear graph and a polynomial-time algorithm for a fixed number (k) of linear graphs. Also, we strongly strengthen the result of [7] by proving that the problem is NP-complete even if ${\mathcal{G}}$ is composed of nested linear graphs of height at most 2, thereby precisely defining the borderline between tractable and intractable instances of the problem. Of particular importance, we improve the result of [7] by showing that the Max-NLS problem is approximable within ratio O(logmopt) in O(kn2) running time, where mopt is the size of an optimal solution. We also present ${{\mathcal O}}(1)$-approximation of Max-NLS problem running in ${{\mathcal O}}(kn)$ time for restricted linear graphs. In particular, for ncRNA derived linear graphs, an $\frac{1}{4}$-approximation is presented.