Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Linear time erasure codes with nearly optimal recovery
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Asymptotically good codes correcting insertions, deletions, and transpositions
IEEE Transactions on Information Theory
Efficient reconstruction of sequences
IEEE Transactions on Information Theory
Estimating statistical aggregates on probabilistic data streams
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Trace reconstruction with constant deletion probability and related results
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Improved string reconstruction over insertion-deletion channels
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Estimating statistical aggregates on probabilistic data streams
ACM Transactions on Database Systems (TODS)
A Survey of Results for Deletion Channels and Related Synchronization Channels
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Hi-index | 0.00 |
We are given a collection of m random subsequences (traces) of a string t of length n where each trace is obtained by deleting each bit in the string with probability q. Our goal is to exactly reconstruct the string t from these observed traces. We initiate here a study of deletion rates for which we can successfully reconstruct the original string using a small number of samples. We investigate a simple reconstruction algorithm called Bitwise Majority Alignment that uses majority voting (with suitable shifts) to determine each bit of the original string. We show that for random strings t, we can reconstruct the original string (w.h.p.) for q = O(1/ log n) using only O(log n) samples. For arbitrary strings t, we show that a simple modification of Bitwise Majority Alignment reconstructs a string that has identical structure to the original string (w.h.p.) for q = O(1/n1/2+ε) using O(1) samples. In this case, using O(n log n) samples, we can reconstruct the original string exactly. Our setting can be viewed as the study of an idealized biological evolutionary process where the only possible mutations are random deletions. Our goal is to understand at what mutation rates, a small number of observed samples can be correctly aligned to reconstruct the parent string.In the process of establishing these results, we show that Bitwise Majority Alignment has an interesting self-correcting property whereby local distortions in the traces do not generate errors in the reconstruction and eventually get corrected.