The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketch-based change detection: methods, evaluation, and applications
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Reversible sketches for efficient and accurate change detection over network data streams
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
The eternal sunshine of the sketch data structure
Computer Networks: The International Journal of Computer and Telecommunications Networking
Sequential hashing: A flexible approach for unveiling significant patterns in high speed networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Hi-index | 0.00 |
Sketch is a sublinear space data structure that allows one to approximately reconstruct the value associated with any given key in an input data stream. It is the basis for answering a number of fundamental queries on data streams, such as range queries, finding quantiles, frequent items, etc. In the networking context, sketch has been applied to identifying heavy hitters and changes, which is critical for traffic monitoring, accounting, and network anomaly detection. In this paper, we propose a novel approach called lsquare to significantly improve the reconstruction accuracy of the sketch data structure. Given a sketch and a set of keys, we estimate the values associated with these keys by constructing a linear system and finding the optimal solution for the system using linear least squares method. We use a large amount of real Internet traffic data to evaluate lsquare against countmin, the state-of-the-art sketch scheme. Our results suggest that given the same memory requirement, lsquare achieves much better reconstruction accuracy than countmin. Alternatively, given the same reconstruction accuracy, lsquare requires significantly less memory. This clearly demonstrates the effectiveness of our approach.