An improvement for dEclat algorithm

  • Authors:
  • Tuan A. Trieu;Yoshitoshi Kunieda

  • Affiliations:
  • Ritsumeikan University, Kyoto, Japan;Ritsumeikan University, Kyoto, Japan

  • Venue:
  • Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The diffset format (the difference of two sets) has drastically reduced the running time and memory usage of the Eclat algorithm and the Eclat algorithm using diffset format is called dEclat algorithm. However, in some sparse datasets, diffset format loses its advantage over tidset format (set of transaction IDs) and in this case it is suggested to use tidset format at starting and then switch to diffset format later. In this paper, we present a novel approach, combination of tidset and diffset, which uses both tidset and diffset format to represent transaction databases in frequent itemset mining. This approach can fully exploit the advantages of both tidset and diffset. Furthermore it does not require conversion of tidsets to diffset format. Preliminary results show that Eclat using this combination approach used less memory and was faster than dEclat in most datasets. We also introduce an improvement for dEclat algorithm, by sorting diffsets and tidsets the memory usage and running time of dEclat could be reduced significantly. A category with the (minimum) three required fields