Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
The probabilistic communication complexity of set intersection
SIAM Journal on Discrete Mathematics
On the distributional complexity of disjointness
Theoretical Computer Science
Communication complexity
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Tight Lower Bounds for the Distinct Elements Problem
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Bitmap algorithms for counting active flows on high speed links
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
An information statistics approach to data stream and communication complexity
Journal of Computer and System Sciences - Special issue on FOCS 2002
Optimal approximations of the frequency moments of data streams
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Simpler algorithm for estimating frequency moments of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Joint data streaming and sampling techniques for detection of super sources and destinations
IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Fast Algorithms for Heavy Distinct Hitters using Associative Memories
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Note: Order statistics and estimating cardinalities of massive data sets
Discrete Applied Mathematics
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hi-index | 0.00 |
A simple indicator for an anomaly in a network is a rapid increase in the total number of distinct network connections. While it is fairly easy to maintain an accurate estimate of the current total number of distinct connections using streaming algorithms that exhibit both a low space and computational complexity, identifying the network entities that are involved in the largest number of distinct connections efficiently is considerably harder. In this paper, we study the problem of finding all entities whose number of distinct (outgoing or incoming) network connections is at least a specific fraction of the total number of distinct connections. These entities are referred to as heavy distinct hitters. Since this problem is hard in general, we focus on randomized approximation techniques and propose a sampling-based and a sketch-based streaming algorithm. Both algorithms output a list of the potential heavy distinct hitters including the estimated counts of the corresponding number of distinct connections. We prove that, depending on the required level of accuracy of the output list, the space complexities of the presented algorithms are asymptotically optimal up to small logarithmic factors. Additionally, the algorithms are evaluated and compared using real network data in order to determine their usefulness in practice.