FIDS: Monitoring Frequent Items over Distributed Data Streams

Authors:
Robert Fuller;Mehmed Kantardzic
Affiliations:
Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292,;Computer Engineering and Computer Science Department, University of Louisville, Louisville, KY 40292,
Venue:
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2007

Citing 14
Cited 0

Wide area traffic: the failure of Poisson modeling

IEEE/ACM Transactions on Networking (TON)
Information Retrieval

Information Retrieval
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Identifying frequent items in sliding windows over on-line packet streams

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Finding (Recently) Frequent Items in Distributed Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sketching streams through the net: distributed approximate query tracking

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Autograph: toward automated, distributed worm signature detection

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications require the discovery of items which have occur frequently within multiple distributed data streams. Past solutions for this problem either require a high degree of error tolerance or can only provide results periodically. In this paper we introduce a new algorithm designed for continuously tracking frequent items over distributed data streams providing either exact or approximate answers. We tested the efficiency of our method using two real-world data sets. The results indicated significant reduction in communication cost when compared to naïve approaches and an existing efficient algorithm called Top-K Monitoring. Since our method does not rely upon approximations to reduce communication overhead and is explicitly designed for tracking frequent items, our method also shows increased quality in its tracking results.