Towards a variable size sliding window model for frequent itemset mining over data streams

Authors:
Mahmood Deypir;Mohammad Hadi Sadreddini;Sattar Hashemi
Affiliations:
University of Aeronautical Science & Technology, P.O. Box 13846-63113, Tehran, Iran;Department of Computer Science and Engineering, School of Engineering, Shiraz University, Shiraz, Iran;Department of Computer Science and Engineering, School of Engineering, Shiraz University, Shiraz, Iran
Venue:
Computers and Industrial Engineering
Year:
2012

Citing 24
Cited 2

KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
estWin: Online data stream mining of recent frequent itemsets by sliding window method

Journal of Information Science
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window

Knowledge and Information Systems
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Finding recently frequent itemsets adaptively over online transactional data streams

Information Systems
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Mining frequent items in a stream using flexible windows

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Maintaining frequent closed itemsets over a sliding window

Journal of Intelligent Information Systems
Mining frequent itemsets over data streams using efficient window sliding techniques

Expert Systems with Applications: An International Journal
Mining non-derivable frequent itemsets over data stream

Data & Knowledge Engineering
Verifying and Mining Frequent Patterns from Large Windows over Data Streams

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Mining frequent itemsets in data streams using the weighted sliding window model

Expert Systems with Applications: An International Journal
estMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams

IEEE Transactions on Knowledge and Data Engineering
Sliding window-based frequent pattern mining over data streams

Information Sciences: an International Journal
Concept Shift Detection for Frequent Itemsets from Sliding Windows over Data Streams

Database Systems for Advanced Applications
Mining data streams with periodically changing distributions

Proceedings of the 18th ACM conference on Information and knowledge management
Mining top-k frequent items in a data stream with flexible sliding windows

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
EclatDS: An efficient sliding window based frequent pattern mining method for data streams

Intelligent Data Analysis

Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering
Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sliding window is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. The main idea behind a transactional sliding window is to keep a fixed size window over a data stream. The window size is kept constant by removing old transactions from the window, when new transactions arrive. Older transactions of window are removed irrespective to whether a significant change has occurred or not. Another challenge of sliding window model is determining window size. The classic approach for determining the window size is to obtain it from the user. In order to determine the precise size of the window, the user must have prior knowledge about the time and scale of changes within the data stream. However, due to the unpredictable changing nature of data streams, this prior knowledge cannot be easily determined. Moreover, by using a fixed window size during a data stream mining, the performance of this model is degraded in terms of reflecting recent changes. Based on these observations, this study relaxes the notion of window size and proposes a new algorithm named VSW (Variable Size sliding Window frequent itemset mining) which is suitable for observing recent changes in the set of frequent itemsets over data streams. The window size is determined dynamically based on amounts of concept change that occurs within the arriving data stream. The window expands as the concept becomes stable and shrinks when a concept change occurs. In this study, it is shown that if stale transactions are removed from the window after a concept change, updated frequent itemsets always belong to the most recent concept. Experimental evaluations on both synthetic and real data show that our algorithm effectively detects the concept change, adjust the window size, and adapts itself to the new concepts along the data stream.