An improvement for dEclat algorithm

Authors:
Tuan A. Trieu;Yoshitoshi Kunieda
Affiliations:
Ritsumeikan University, Kyoto, Japan;Ritsumeikan University, Kyoto, Japan
Venue:
Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Year:
2012

Citing 5
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Turbo-charging vertical mining of large databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The diffset format (the difference of two sets) has drastically reduced the running time and memory usage of the Eclat algorithm and the Eclat algorithm using diffset format is called dEclat algorithm. However, in some sparse datasets, diffset format loses its advantage over tidset format (set of transaction IDs) and in this case it is suggested to use tidset format at starting and then switch to diffset format later. In this paper, we present a novel approach, combination of tidset and diffset, which uses both tidset and diffset format to represent transaction databases in frequent itemset mining. This approach can fully exploit the advantages of both tidset and diffset. Furthermore it does not require conversion of tidsets to diffset format. Preliminary results show that Eclat using this combination approach used less memory and was faster than dEclat in most datasets. We also introduce an improvement for dEclat algorithm, by sorting diffsets and tidsets the memory usage and running time of dEclat could be reduced significantly. A category with the (minimum) three required fields