Discrete logarithms in GF(P) using the number field sieve
SIAM Journal on Discrete Mathematics
Communication complexity of document exchange
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Introduction to Coding Theory
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
A sublinear algorithm for weakly approximating edit distance
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Algorithmic Applications of Low-Distortion Geometric Embeddings
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Approximating Edit Distance Efficiently
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Oblivious string embeddings and edit distance approximations
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved lower bounds for embeddings into L1
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
Low distortion embeddings for edit distance
Journal of the ACM (JACM)
Earth mover distance over high-dimensional spaces
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Explicit Non-adaptive Combinatorial Group Testing Schemes
ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
From coding theory to efficient pattern matching
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Sketching techniques for collaborative filtering
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Exact and Approximate Pattern Matching in the Streaming Model
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Approximate sparse recovery: optimizing time and measurements
Proceedings of the forty-second ACM symposium on Theory of computing
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
FOCS '10 Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
Fingerprinting ratings for collaborative filtering: theoretical and empirical analysis
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Exponential time improvement for min-wise based algorithms
Information and Computation
Theory and applications of b-bit minwise hashing
Communications of the ACM
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Efficiently decodable error-correcting list disjunct matrices and applications
ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Periodicity and cyclic shifts via linear sketches
APPROX'11/RANDOM'11 Proceedings of the 14th international workshop and 15th international conference on Approximation, randomization, and combinatorial optimization: algorithms and techniques
Analyzing graph structure via linear measurements
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Sublinear time, measurement-optimal, sparse recovery for all
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Graph sketches: sparsification, spanners, and subgraphs
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Worst-case optimal join algorithms: [extended abstract]
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Improved sketching of hamming distance with error correcting
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Efficient communication protocols for deciding edit distance
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Hi-index | 0.00 |
Fingerprinting is a widely-used technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the "dissimilarity" of non-identical files to be estimated. Many sketches have been proposed for dissimilarity measures that decompose coordinate-wise such as the Hamming distance between alphanumeric strings, or the Euclidean distance between vectors. However, virtually nothing is known on sketches that would accommodate alignment errors. With such errors, Hamming or Euclidean distances are rendered useless: a small misalignment may result in a file that looks very dissimilar to the original file according such measures. In this paper, we present the first linear sketch that is robust to a small number of alignment errors. Specifically, the sketch can be used to determine whether two files are within a small Hamming distance of being a cyclic shift of each other. Furthermore, the sketch is homomorphic with respect to rotations: it is possible to construct the sketch of a cyclic shift of a file given only the sketch of the original file. The relevant dissimilarity measure, known as the shift distance, arises in the context of embedding edit distance and our result addressed an open problem [Question 13 in Indyk-McGregor-Newman-Onak'11] with a rather surprising outcome. Our sketch projects a length $n$ file into D(n) ⋅ polylog n dimensions where D(n)l n is the number of divisors of n. The striking fact is that this is near-optimal, i.e., the D(n) dependence is inherent to a problem that is ostensibly about lossy compression. In contrast, we then show that any sketch for estimating the edit distance between two files, even when small, requires sketches whose size is nearly linear in n. This lower bound addresses a long-standing open problem on the low distortion embeddings of edit distance [Question 2.15 in Naor-Matousek'11, Indyk'01], for the case of linear embeddings.