Interactive stream mining of maximal frequent itemsets allowing flexible time intervals and support thresholds

  • Authors:
  • Ming-Yen Lin;Sue-Chen Hsueh;Chien-Hsiang Tung

  • Affiliations:
  • Feng Chia University, Taichung, Taiwan;Chaoyang University of Technology, Taichung, Taiwan;Feng Chia University, Taichung, Taiwan

  • Venue:
  • Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stream data mining is to extract useful patterns or knowledge from continuous, rapid data elements in modern applications. The discovery of frequent patterns in data streams generally is constrained by the usage of bounded memory and computation time. Most algorithms for mining frequent itemsets in streaming transactions assume a fixed minimum threshold and an unchangeable time interval. The support threshold, however, should be changeable to cope with the needs of the users and the characteristics of the incoming data. In addition, allowing the specification of the interesting time period of data may enhance the discovered knowledge. Still, the number of frequent itemsets might be too large to discovering the trends or changes. Thus, maximal frequent itemsets (MFIs) with respect to a changeable support in a user specified period become a favorable objective in stream data mining. In this paper, we propose an algorithm named VIMFI for mining MFIs in a data stream, allowing an arbitrary time interval and support threshold. A bounded memory space is allocated for summarizing all the transactions. VIMFI appends transactions to the summary structure and compresses the structure when it becomes full. Corresponding transactions in the specified interval will be extracted and a mining will be performed for the desired MFIs within that interval. Experiments using both synthetic and real-world datasets demonstrate that VIMFI efficiently mines MFIs in data streams with flexible time intervals and changeable support thresholds.