Discovering correlated items in data streams

Authors:
Xingzhi Sun;Ming Chang;Xue Li;Maria E. Orlowska
Affiliations:
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 13
Cited 0

Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the security of pay-per-click and other Web advertising schemes

WWW '99 Proceedings of the eighth international conference on World Wide Web
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management

ACM SIGMOD Record
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Using association rules for fraud detection in web advertising networks

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the problem of finding frequent items in a data stream has been well studied. However, for some applications, such as HTTP log analysis, there is a need to analyze the correlations amongst frequent items in data streams. In this paper, we investigate the problem of finding correlated items based on the concept of unexpectedness. That is, two items x and y are correlated if both items are frequent and their actual number of co-occurrences in the data stream is significantly different from the expected value, which can be computed by the frequencies of x and y. Based on the Space-Saving algorithm [1], we propose a new one-pass algorithm, namely Stream-Correlation, to discover correlated item pairs. The key part of our algorithm is to efficiently estimate the frequency of co-occurrences of items with small memory space. The possible error can be tightly bounded by controlling the memory space. Experiment results show the effectiveness and the efficiency of the algorithm.