On the closest string via rank distance

  • Authors:
  • Liviu P. Dinu;Alexandru Popa

  • Affiliations:
  • Faculty of Mathematics and Computer Science, University of Bucharest, Bucharest, Romania;Department of Communications & Networking, Aalto University School of Electrical Engineering, Aalto, Finland

  • Venue:
  • CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a set S of k strings of maximum length n, the goal of the closest substring problem (CSSP) is to find the smallest integer d (and a corresponding string t of length ℓ≤n) such that each string s∈S has a substring of length ℓ of "distance" at most d to t. The closest string problem (CSP) is a special case of CSSP where ℓ=n. CSP and CSSP arise in many applications in bioinformatics and are extensively studied in the context of Hamming and edit distance. In this paper we consider a recently introduced distance measure, namely the rank distance. First, we show that the CSP and CSSP via rank distance are NP-hard. Then, we present a polynomial time k-approximation algorithm for the CSP problem. Finally, we give a parametrized algorithm for the CSP (the parameter is the number of input strings) if the alphabet is binary and each string has the same number of 0's and 1's.