Mining Approximate Frequency Itemsets over Data Streams Based on D-Hash Table

Authors:
Chunhua Ju;Gang You
Affiliations:
-;-
Venue:
SNPD '09 Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing
Year:
2009

Citing 0
Cited 1

Increasing availability of industrial systems through data stream mining

Computers and Industrial Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frequent itemsets (or frequent pattern) mining, which is the basic step during data stream mining, has been paid more and more attention by researchers. Because of the uncertainties and continuities of data stream, the time-efficiency and space-efficiency of many mining algorithms are unaccepted. In this paper, hashed table is introduced to represent the synoptic data structure. By this way, the memory footprints in Lossy Counting algorithms can be reduced. In addition, the algorithm of frequent itemsets mining based on D-Hashed Table (MFS-HT for short) is proposed to obtain the items whose frequency count exceeded a user-specified threshold in data streams. Comparing with Lossy Counting and a similar algorithm called Mining Frequent Item sets over data Streams by Matrix (MISM for short), the experiment result shows that MFS-HT is more effective both in time and space efficiency.