A test paradigm for detecting changes in transactional data streams

Authors:
Willie Ng;Manoranjan Dash
Affiliations:
Centre for Advanced Information Systems, Nanyang Technological University, Singapore;Centre for Advanced Information Systems, Nanyang Technological University, Singapore
Venue:
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Year:
2008

Citing 23
Cited 1

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Extracting Share Frequent Itemsets with Infrequent Subsets

Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Random Sampling from Database Files: A Survey

Proceedings of the 5th International Conference SSDBM on Statistical and Scientific Database Management
Maintaining time-decaying stream aggregates

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Evaluation of sampling for data mining of association rules

RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Efficient Progressive Sampling for Association Rules

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Online Data Mining for Co-Evolving Time Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Mining Association Rules with Weighted Items

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
Mining High Utility Itemsets

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted Association Rule Mining using weighted support and significance framework

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Sampling algorithms in a stream operator

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Research issues in data stream association rule mining

ACM SIGMOD Record
Efficient Reservoir Sampling for Transactional Data Streams

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
False positive or false negative: mining frequent itemsets from high speed transactional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A two-phase algorithm for fast discovery of high utility itemsets

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

A survey on concept drift adaptation

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A pattern is considered useful if it can be used to help a person to achieve his goal. Mining data streams for useful patterns is important in many applications. However, data stream can change their behavior over time and, when significant change occurs, much harm is done to the mining result if it is not properly handled. In the past, there have been many studies mainly on adapting to changes in data streams.We contend that adapting to changes is simply not enough. The ability to detect and characterize change is also essential in many applications, for example intrusion detection, network traffic analysis, data streams from intensive care units etc. Detecting changes is nontrivial. In this paper, an online algorithm for change detection in utility mining is proposed. In order to provide a mechanism for making quantitative description of the detected change, we adopt the statistical test.We believe there is the opportunity for an immensely rewarding synergy between data mining and statistic. Different statistical significance tests are evaluated and our study shows that the Chi-square test is the most suitable for enumerated or count data (as is the case for high utility itemsets). We demonstrate the effectiveness of the proposed method by testing it on IBM QUEST market-basket data.