A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Properties and prediction of flow statistics from sampled packet streams
Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Estimating Rarity and Similarity over Data Stream Windows
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Data streaming algorithms for efficient and accurate estimation of flow size distribution
Proceedings of the joint international conference on Measurement and modeling of computer systems
What's hot and what's not: tracking most frequent items dynamically
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Data streaming algorithms for accurate and efficient measurement of traffic and flow matrices
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimating flow distributions from sampled flow statistics
IEEE/ACM Transactions on Networking (TON)
Estimating Top N Hosts in Cardinality Using Small Memory Resources
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Data streaming algorithms for estimating entropy of network traffic
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Journal of the ACM (JACM)
An integrated efficient solution for computing frequent and top-k elements in data streams
ACM Transactions on Database Systems (TODS)
Bitmap algorithms for counting active flows on high-speed links
IEEE/ACM Transactions on Networking (TON)
Joint data streaming and sampling techniques for detection of super sources and destinations
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A data streaming algorithm for estimating entropies of od flows
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
A simple and efficient estimation method for stream expression cardinalities
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Counter braids: a novel counter architecture for per-flow measurement
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Estimating Local Cardinalities in a Multidimensional Multiset
AIMS '07 Proceedings of the 1st international conference on Autonomous Infrastructure, Management and Security: Inter-Domain Management
A resource-minimalist flow size histogram estimator
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Optimising online FPS game server discovery through clustering servers by origin autonomous system
Proceedings of the 18th International Workshop on Network and Operating Systems Support for Digital Audio and Video
Hi-index | 0.01 |
Flow level information is important for many applications in network measurement and analysis. In this work, we tackle the ''Top Spreaders'' and ''Top Scanners'' problems, where hosts that are spreading the largest numbers of flows, especially small flows, must be efficiently and accurately identified. The identification of these top users can be very helpful in network management, traffic engineering, application behavior analysis, and anomaly detection. We propose novel streaming algorithms and a ''Filter-Tracker-Digester'' framework to catch the top spreaders and scanners online. Our framework combines sampling and streaming algorithms, as well as deterministic and randomized algorithms, in such a way that they can effectively help each other to improve accuracy while reducing memory usage and processing time. To our knowledge, we are the first to tackle the ''Top Scanners'' problem in a streaming way. We address several challenges, namely: traffic scale, skewness, speed, memory usage, and result accuracy. The performance bounds of our algorithms are derived analytically, and are also evaluated by both real and synthetic traces, where we show our algorithm can achieve accuracy and speed of at least an order of magnitude higher than existing approaches.