Linear approximation of shortest superstrings

Authors:
Avrim Blum;Tao Jiang;Ming Li;John Tromp;Mihalis Yannakakis
Affiliations:
Massachusetts Institute of Technology, Cambridge;McMaster Univ., Hamilton, Ont., Canada;Univ. of Waterloo, Waterloo, Ont., Canada;CWI, Amsterdam, The Netherlands;AT&T Bell Labs, Murray Hill, NJ
Venue:
Journal of the ACM (JACM)
Year:
1994

Citing 8
Cited 38

A theory of the learnable

Communications of the ACM
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Data compression: methods and theory

Data compression: methods and theory
A greedy approximation algorithm for constructing shortest common superstrings

Theoretical Computer Science - International Symposium on Mathematical Foundations of Computer Science, Bratisl
Computational molecular biology

Computational molecular biology
Optimization, approximation, and complexity classes

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Approximation algorithms for the shortest common superstring problem

Information and Computation
The traveling salesman problem with distances one and two

Mathematics of Operations Research

Proof verification and the hardness of approximation problems

Journal of the ACM (JACM)
On the wavelength assignment problem in multifiber WDM star and ring networks

IEEE/ACM Transactions on Networking (TON)
An 8/13-approximation algorithm for the asymmetric maximum TSP

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Improving table compression with combinatorial optimization

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Whole-Genome DNA Sequencing

Computing in Science and Engineering
Sharpening Occam's razor

Information Processing Letters
Approximating asymmetric maximum TSP

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
An Approximate Algorithm for the Weighted Hamiltonian Path Completion Problem on a Tree

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Sharpening Occam's Razor

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
On the Approximation Ratio of the Group-Merge Algorithm for the Shortest Common Suerstring Problem

SOFSEM '00 Proceedings of the 27th Conference on Current Trends in Theory and Practice of Informatics
Lower Bounds for Approximating Shortest Superstrings over an Alphabet of Size 2

WG '99 Proceedings of the 25th International Workshop on Graph-Theoretic Concepts in Computer Science
Diagram processing: computing with diagrams

Artificial Intelligence
Improving table compression with combinatorial optimization

Journal of the ACM (JACM)
An 8/13-approximation algorithm for the asymmetric maximum TSP

Journal of Algorithms
Fast prefix matching of bounded strings

Journal of Experimental Algorithmics (JEA)
Combined super-/substring and super-/subsequence problems

Theoretical Computer Science
The greedy algorithm for shortest superstrings

Information Processing Letters
Approximation algorithms for asymmetric TSP by decomposing directed regular multigraphs

Journal of the ACM (JACM)
The approximability of the weighted Hamiltonian path completion problem on a tree

Theoretical Computer Science
The Shortest Common Superstring Problem and Viral Genome Compression

Fundamenta Informaticae - SPECIAL ISSUE ON TRAJECTORIES OF LANGUAGE THEORY Dedicated to the memory of Alexandru Mateescu
From first principles to the Burrows and Wheeler transform and beyond, via combinatorial optimization

Theoretical Computer Science
Why Greed Works for Shortest Common Superstring Problem

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Minimum-weight cycle covers and their approximability

Discrete Applied Mathematics
Why greed works for shortest common superstring problem

Theoretical Computer Science
The greedy algorithm for shortest superstrings

Information Processing Letters
Minimum-weight cycle covers and their approximability

WG'07 Proceedings of the 33rd international conference on Graph-theoretic concepts in computer science
Shortest common superstring problem with discrete neural networks

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Algorithms for three versions of the shortest common superstring problem

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Average case analysis of algorithms

Algorithms and theory of computation handbook
Approximation algorithms for NP-hard optimization problems

Algorithms and theory of computation handbook
On shortest common superstring and swap permutations

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Restricted common superstring and restricted common supersequence

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Approximation algorithms for restricted cycle covers based on cycle decompositions

WG'06 Proceedings of the 32nd international conference on Graph-Theoretic Concepts in Computer Science
Viral genome compression

DNA'06 Proceedings of the 12th international conference on DNA Computing
A 6-approximation algorithm for computing smallest common aon-supertree with application to the reconstruction of glycan trees

ISAAC'06 Proceedings of the 17th international conference on Algorithms and Computation
The Shortest Common Superstring Problem and Viral Genome Compression

Fundamenta Informaticae - SPECIAL ISSUE ON TRAJECTORIES OF LANGUAGE THEORY Dedicated to the memory of Alexandru Mateescu
Restricted and swap common superstring: a parameterized view

IPEC'12 Proceedings of the 7th international conference on Parameterized and Exact Computation
A probabilistic PTAS for shortest common superstring

Theoretical Computer Science

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider the following problem: given a collection of strings s1,…, sm, find the shortest string s such that each si appears as a substring (a consecutive block) of s. Although this problem is known to be NP-hard, a simple greedy procedure appears to do quite well and is routinely used in DNA sequencing and data compression practice, namely: repeatedly merge the pair of (distinct) strings with maximum overlap until only one string remains. Let n denote the length of the optimal superstring. A common conjecture states that the above greedy procedure produces a superstring of length O(n) (in fact, 2n), yet the only previous nontrivial bound known for any polynomial-time algorithm is a recent O(n log n) result.We show that the greedy algorithm does in fact achieve a constant factor approximation, proving an upper bound of 4n. Furthermore, we present a simple modified version of the greedy algorithm that we show produces a superstring of length at most 3n. We also show the superstring problem to be MAXSNP-hard, which implies that a polynomial-time approximation scheme for this problem is unlikely.