On approximating string selection problems with outliers

Authors:
Christina Boucher;Gad M. Landau;Avivit Levy;David Pritchard;Oren Weimann
Affiliations:
Department of Computer Science, University of California, San Diego, USA;Department of Computer Science, University of Haifa, Haifa 31905, Israel and Polytechnic Institute of NYU, Brooklyn, NY 11201-3840, USA;Shenkar College for Engineering and Design, Ramat-Gan 52526, Israel and CRI, University of Haifa, Mount Carmel, Haifa 31905, Israel;CEMC, University of Waterloo, Canada;Department of Computer Science, University of Haifa, Haifa 31905, Israel
Venue:
Theoretical Computer Science
Year:
2013

Citing 24
Cited 0

Randomized algorithms

Randomized algorithms
NC-approximation schemes for NP- and PSPACE-hard problems for geometric graphs

Journal of Algorithms
Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems

Journal of the ACM (JACM)
Efficient approximation algorithms for the Hamming center problem

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Some optimal inapproximability results

Journal of the ACM (JACM)
Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Polynominal Time Approximation Scheme for the Closest Substring Problem

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Distinguishing string selection problems

Information and Computation
Parameterized Intractability of Distinguishing Substring Selection

Theory of Computing Systems
On The Parameterized Intractability Of Motif Search Problems*

Combinatorica
On the Optimality of the Dimensionality Reduction Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Ruling Out PTAS for Graph Min-Bisection, Dense k-Subgraph, and Bipartite Clique

SIAM Journal on Computing
Efficient Algorithms for the Closest String and Distinguishing String Selection Problems

FAW '09 Proceedings of the 3d International Workshop on Frontiers in Algorithmics
Closest Substring Problems with Small Distances

SIAM Journal on Computing
Detecting high log-densities: an O(n¼) approximation for densest k-subgraph

Proceedings of the forty-second ACM symposium on Theory of computing
Cycle detection and correction

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
More Efficient Algorithms for Closest String and Substring Problems

SIAM Journal on Computing
Approximations and partial solutions for the consensus sequence problem

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A three-string approach to the closest string problem

Journal of Computer and System Sciences
Efficient approximation schemes for geometric problems?

ESA'05 Proceedings of the 13th annual European conference on Algorithms
Slightly superexponential parameterized problems

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Better inapproximability results for maxclique, chromatic number and min-3lin-deletion

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Parameterized Complexity and Approximation Algorithms

The Computer Journal

Quantified Score

Hi-index	5.23

Visualization

Abstract

Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of the same-length strings, and a parameter d, find a string x that maximizes the number of ''non-outliers'' within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old erroneous proof made previously in the literature. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no efficient PTAS (EPTAS) unless a parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.