Faster Algorithm for the Set Variant of the String Barcoding Problem

  • Authors:
  • Leszek Gąsieniec;Cindy Y. Li;Meng Zhang

  • Affiliations:
  • Department of Computer Science, University of Liverpool, Liverpool, UK;Histocompatibility and Immunogenetics Laboratory, National Blood Service, Bristol, UK;College of Computer Science and Technology, Jilin University, Changchun, China

  • Venue:
  • CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A string barcoding problemis defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings ${\cal S}$. In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen from a particular set of substrings of cardinality n. This variant can be also obtained from more general test set problem, see, e.g., [1] by fixing appropriate parameters. We present almost optimal $O(n|{\cal S}|\log^3 n)$-time approximation algorithm for the considered problem. Our approximation procedure is a modification of the algorithm due to Berman et al.[1] which obtains the best possible approximation ratio (1 + ln n), providing $NP\not\subseteq DTIME(n^{\log\log n})$. The improved time complexity is a direct consequence of more careful management of processed sets, use of several specialised graph and string data structures as well as tighter time complexity analysis based on an amortised argument.