Towards a DNA sequencing theory (learning a string)

Authors:
M. Li
Affiliations:
Waterloo Univ., Ont., Canada
Venue:
SFCS '90 Proceedings of the 31st Annual Symposium on Foundations of Computer Science
Year:
1990

Citing 0
Cited 11

Sharpening Occam's Razor

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
On the Approximation Ratio of the Group-Merge Algorithm for the Shortest Common Suerstring Problem

SOFSEM '00 Proceedings of the 27th Conference on Current Trends in Theory and Practice of Informatics
Lower Bounds for Approximating Shortest Superstrings over an Alphabet of Size 2

WG '99 Proceedings of the 25th International Workshop on Graph-Theoretic Concepts in Computer Science
Why Greed Works for Shortest Common Superstring Problem

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Why greed works for shortest common superstring problem

Theoretical Computer Science
Shortest common superstring problem with discrete neural networks

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Approximation algorithms for NP-hard optimization problems

Algorithms and theory of computation handbook
Covering analysis of the greedy algorithm for partial cover

Algorithms and Applications
A bibliography on computational molecular biology and genetics

Mathematical and Computer Modelling: An International Journal
Improved inapproximability results for the shortest superstring and related problems

CATS '13 Proceedings of the Nineteenth Computing: The Australasian Theory Symposium - Volume 141
A probabilistic PTAS for shortest common superstring

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mathematical frameworks suitable for massive automated DNA sequencing and for analyzing DNA sequencing algorithms are studied under plausible assumptions. The DNA sequencing problem is modeled as learning a superstring from its randomly drawn substrings. Under certain restrictions, this may be viewed as learning a superstring in L.G. Valiant's (1984) learning model, and in this case the author gives an efficient algorithm for learning a superstring and a quantitative bound on how many samples suffice. A major obstacle to the approach turns out to be a quite well-known open question on how to approximate the shortest common superstring of a set of strings. The author presents the first provably good algorithm that approximates the shortest superstring of length n by a superstring of length O(n log n).