The shortest common superstring problem: average case analysis for both exact and approximate matching

Authors:
En-hui Yang;Zhen Zhang
Affiliations:
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont.;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 3

Why Greed Works for Shortest Common Superstring Problem

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Why greed works for shortest common superstring problem

Theoretical Computer Science
A probabilistic PTAS for shortest common superstring

Theoretical Computer Science

Quantified Score

Hi-index	754.84

Visualization

Abstract

The shortest common superstring problem and its extension to approximate matching are considered in the probability model where each string in a given set has the same length and letters of strings are drawn independently from a finite set. In the exact matching case, several algorithms proposed in the literature are shown to be asymptotically optimal in the sense that the ratio of the savings resulting from the superstring constructed by each of these algorithms, that is the difference between the total length of the strings in the given set and the length of the superstring, to the optimal savings from the shortest superstring approaches in probability to 1 as the number of strings in the given set increases. In the approximate matching case, a modified version of the shortest common approximate matching superstring problem is analyzed; it is demonstrated that the optimal savings in this case is given approximately by nlogn/Il(Q,Q,2D), where n is the number of strings in the given set, Q is the probability distribution governing the selection of letters of strings, Il(Q,Q,2D) is the lower mutual information between Q and Q with respect to 2D, and D⩾0 is the distortion allowed in approximate matching. In addition, an approximation algorithm is proposed and proved asymptotically optimal