On approximating string selection problems with outliers

Authors:
Christina Boucher;Gad M. Landau;Avivit Levy;David Pritchard;Oren Weimann
Affiliations:
Department of Computer Science, University of California, San Diego;Department of Computer Science, University of Haifa, Haifa, Israel,Polytechnic Institute of NYU, Brooklyn, NY;Shenkar College for Engineering and Design, Ramat-Gan, Israel,CRI, University of Haifa, Haifa, Israel;CEMC, University of Waterloo, Canada;Department of Computer Science, University of Haifa, Haifa, Israel
Venue:
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Year:
2012

Citing 17
Cited 1

Randomized algorithms

Randomized algorithms
Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems

Journal of the ACM (JACM)
Distinguishing string selection problems

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Efficient approximation algorithms for the Hamming center problem

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Some optimal inapproximability results

Journal of the ACM (JACM)
Finding similar regions in many sequences

Journal of Computer and System Sciences - STOC 1999
A Polynominal Time Approximation Scheme for the Closest Substring Problem

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
On the Optimality of the Dimensionality Reduction Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Ruling Out PTAS for Graph Min-Bisection, Dense k-Subgraph, and Bipartite Clique

SIAM Journal on Computing
Efficient Algorithms for the Closest String and Distinguishing String Selection Problems

FAW '09 Proceedings of the 3d International Workshop on Frontiers in Algorithmics
Detecting high log-densities: an O(n¼) approximation for densest k-subgraph

Proceedings of the forty-second ACM symposium on Theory of computing
A three-string approach to the closest string problem

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
More Efficient Algorithms for Closest String and Substring Problems

SIAM Journal on Computing
Approximations and partial solutions for the consensus sequence problem

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Slightly superexponential parameterized problems

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Better inapproximability results for maxclique, chromatic number and min-3lin-deletion

ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Parameterized Complexity and Approximation Algorithms

The Computer Journal

An efficient two-phase ant colony optimization algorithm for the closest string problem

SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no efficient PTAS (EPTAS) unless the parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.