Finding a duplicate and a missing item in a stream

Authors:
Jun Tarui
Affiliations:
Department of Info and Comm Eng, University of Electro-Comm, Chofu, Tokyo, Japan
Venue:
TAMC'07 Proceedings of the 4th international conference on Theory and applications of models of computation
Year:
2007

Citing 8
Cited 2

Threshold functions and bounded depth monotone circuits

Journal of Computer and System Sciences
Computational limitations of small-depth circuits

Computational limitations of small-depth circuits
The complexity of finite functions

Handbook of theoretical computer science (vol. A)
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Communication complexity

Communication complexity
A theorem on probabilistic constant depth Computations

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Read-Once Branching Programs, Rectangular Proofs of the Pigeonhole Principle and the Transversal Calculus

Combinatorica
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science

Finding duplicates in a data stream

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Tight bounds for Lp samplers, finding duplicates in streams, and related problems

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider the following problem in a stream model: Given a sequence a = 〈a1, a2,..., am〉 wich each ai ∈ [n] = {1,..., n} and m n, find a duplicate in the sequence, i.e., find some d = ai = al with i ≠ l by using limited s bits of memory and r passes over the input sequence. In one pass an algorithm reads the input sequence a in the order a1, a2,..., am. Since m n, a duplicate exists by the pigeonhole principle. Muthukrishnan [Mu05a], [Mu05b] has posed the following question for the case where m = n+1: For s = O(log n), is there a solution with a constant number of passes? We have described the problem generalizing Muthukrishnan's question by taking the sequence length m as a parameter. We give a negative answer to the original question by showing the following: Assume that m = n + 1. A streaming algorithm with O(log n) space requires Ω(log n/ log log n) passes; a k-pass streaming algorithm requires Ω(n1/(2k-1)) space. We also consider the following problem of finding a missing item: Assuming that n m, find x ∈ [m] such that x ≠ aj for 1 ≤ j ≤ n. The same lower bound applies for the missing-item finding problem. The proof is a simple reduction to the communication complexity of a relation. We also consider one-pass algorithms and exactly determine the minimum space required. Interesting open questions such as the following remain. For the number of passes of algorithms using O(log n) space, show an ω(1) lower bound (or an O(1) upper bound) for: (1) duplicate finding for m = 2n, (2) missing-item finding for m = 2n, and (3) the case where we allow Las-Vegas type randomization for m = n + 1.