Time-decaying sketches for sensor data aggregation
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Time-decaying Sketches for Robust Aggregation of Sensor Data
SIAM Journal on Computing
Near-optimal private approximation protocols via a black box transformation
Proceedings of the forty-third annual ACM symposium on Theory of computing
Journal of Network and Computer Applications
Rectangle-efficient aggregation in spatial data streams
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Survey: Streaming techniques and data aggregation in networks of tiny artefacts
Computer Science Review
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Subconstant Error
ACM Transactions on Algorithms (TALG) - Special Issue on SODA'11
Sketching for big data recommender systems using fast pseudo-random fingerprints
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part II
Hi-index | 0.00 |
Efficient one-pass estimation of $F_0$, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider range-efficient estimation of $F_0$: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algorithm which yields an (&egr;, &dgr;)-approximation of $F_0$, with the following time and space complexities ($n$ is the size of the universe of the items): (1) The amortized processing time per interval is $O(\log{\frac{1}{\delta}}\log \frac{n}{\epsilon})$. (2) The workspace used is $O(\frac{1}{\epsilon^2}\log{\frac{1}{\delta}}\log n)$ bits. Our algorithm improves upon a previous algorithm by Bar-Yossef, Kumar and Sivakumar [Proceedings of the $13$th ACM-SIAM Symposium on Discrete Algorithms (SODA), 2002, pp. 623-632], which requires $O(\frac{1}{\epsilon^5} \log{\frac{1}{\delta}}\log^5 n)$ processing time per item. This algorithm can also be used to compute the max-dominance norm of a stream of multiple signals and significantly improves upon the previous best time and space bounds by Cormode and Muthukrishnan [Proceedings of the $11$th European Symposium on Algorithms (ESA), Lecture Notes in Comput. Sci. 2938, Springer, Berlin, 2003, pp. 148-160]. This algorithm also provides an efficient solution to the distinct summation problem, which arises during data aggregation in sensor networks [Proceedings of the $2$nd International Conference on Embedded Networked Sensor Systems, ACM Press, New York, 2004, pp. 250-262, Proceedings of the $20$th International Conference on Data Engineering (ICDE), 2004, pp. 449-460].