String barcoding: uncovering optimal virus signatures
Proceedings of the sixth annual international conference on Computational biology
Rapid identification of repeated patterns in strings, trees and arrays
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
DNA-BAR: distinguisher selection for DNA barcoding
Bioinformatics
Tight approximability results for test set problems in bioinformatics
Journal of Computer and System Sciences
Highly scalable algorithms for robust string barcoding
International Journal of Bioinformatics Research and Applications
Hi-index | 0.00 |
A string barcoding problemis defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings ${\cal S}$. In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen from a particular set of substrings of cardinality n. This variant can be also obtained from more general test set problem, see, e.g., [1] by fixing appropriate parameters. We present almost optimal $O(n|{\cal S}|\log^3 n)$-time approximation algorithm for the considered problem. Our approximation procedure is a modification of the algorithm due to Berman et al.[1] which obtains the best possible approximation ratio (1 + ln n), providing $NP\not\subseteq DTIME(n^{\log\log n})$. The improved time complexity is a direct consequence of more careful management of processed sets, use of several specialised graph and string data structures as well as tighter time complexity analysis based on an amortised argument.