Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Practical automated detection of stealthy portscans
Journal of Computer Security
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Tight Lower Bounds for the Distinct Elements Problem
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Optimal space lower bounds for all frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Medians and beyond: new aggregation techniques for sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Data streaming algorithms for estimating entropy of network traffic
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A multifaceted approach to understanding the botnet phenomenon
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Detecting Click Fraud in Pay-Per-Click Streams of Online Advertising Networks
ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
How healthy are today's enterprise networks?
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
An empirical evaluation of entropy-based traffic anomaly detection
Proceedings of the 8th ACM SIGCOMM conference on Internet measurement
Finding frequent items in data streams
Proceedings of the VLDB Endowment
Automating analysis of large-scale botnet probing events
Proceedings of the 4th International Symposium on Information, Computer, and Communications Security
Exploiting Temporal Persistence to Detect Covert Botnet Channels
RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Tracking long duration flows in network traffic
INFOCOM'10 Proceedings of the 29th conference on Information communications
Estimating entropy and entropy norm on data streams
STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Hi-index | 0.00 |
Motivated by scenarios in network anomaly detection, we consider the problem of detecting persistent items in a data stream, which are items that occur "regularly" in the stream. In contrast with heavy-hitters, persistent items do not necessarily contribute significantly to the volume of a stream, and may escape detection by traditional volume-based anomaly detectors. We first show that any online algorithm that tracks persistent items exactly must necessarily use a large workspace, and is infeasible to run on a traffic monitoring node. In light of this lower bound, we introduce an approximate formulation of the problem and present a small-space algorithm to approximately track persistent items over a large data stream. Our experiments on a real traffic dataset shows that in typical cases, the algorithm achieves a physical space compression of 5x-7x, while incurring very few false positives (