Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the security of pay-per-click and other Web advertising schemes
WWW '99 Proceedings of the eighth international conference on World Wide Web
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management
ACM SIGMOD Record
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
ACM Transactions on Computer Systems (TOCS)
Dynamically maintaining frequent items over a data stream
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Using association rules for fraud detection in web advertising networks
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient computation of frequent and top-k elements in data streams
ICDT'05 Proceedings of the 10th international conference on Database Theory
Hi-index | 0.00 |
Recently, the problem of finding frequent items in a data stream has been well studied. However, for some applications, such as HTTP log analysis, there is a need to analyze the correlations amongst frequent items in data streams. In this paper, we investigate the problem of finding correlated items based on the concept of unexpectedness. That is, two items x and y are correlated if both items are frequent and their actual number of co-occurrences in the data stream is significantly different from the expected value, which can be computed by the frequencies of x and y. Based on the Space-Saving algorithm [1], we propose a new one-pass algorithm, namely Stream-Correlation, to discover correlated item pairs. The key part of our algorithm is to efficiently estimate the frequency of co-occurrences of items with small memory space. The possible error can be tightly bounded by controlling the memory space. Experiment results show the effectiveness and the efficiency of the algorithm.