Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Sketch-based change detection: methods, evaluation, and applications
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Network performance monitoring at small time scales
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Improving sketch reconstruction accuracy using linear least squares method
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Reversible sketches: enabling monitoring and analysis over high-speed data streams
IEEE/ACM Transactions on Networking (TON)
Probabilistic lossy counting: an efficient algorithm for finding heavy hitters
ACM SIGCOMM Computer Communication Review
Mining approximate frequent closed flows over packet streams
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Hi-index | 0.00 |
Identification of significant patterns in network traffic, such as IPs or flows that contribute large volume (heavy hitters) or those that introduce large changes of volume (heavy changers), has many applications in accounting and network anomaly detection. As network speed and the number of flows grow rapidly, identifying heavy hitters/changers by tracking per-IP or per-flow statistics becomes infeasible due to both the computational overhead and memory requirements. In this paper, we propose SeqHash, a novel sequential hashing scheme that supports fast and accurate recovery of heavy hitters/changers, while requiring memory just slightly higher than the theoretical lower bound. SeqHash monitors data traffic using a sketch data structure that can flexibly trade-off between the memory usage and the computational overhead in a large range that can be utilized by different computer architectures for optimizing the overall performance. In addition, we propose statistically efficient algorithms for estimating the values of heavy hitters/changers. Using both mathematical analysis and experimental studies of Internet traces, we demonstrate that SeqHash can achieve the same accuracy as the existing methods do but using much less memory and computational overhead.