DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets

  • Authors:
  • Bay Vo;Tzung-Pei Hong;Bac Le

  • Affiliations:
  • Department of Computer Science, Ho Chi Minh City University of Technology, Ho Chi Minh, Viet Nam;Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, ROC and Department of Computer Science and Engineering, National Sun Yat-sen Univer ...;Department of Computer Science, University of Science, Ho Chi Minh, Viet Nam

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.06

Visualization

Abstract

Frequent closed itemsets (FCI) play an important role in pruning redundant rules fast. Therefore, a lot of algorithms for mining FCI have been developed. Algorithms based on vertical data formats have some advantages in that they require scan databases once and compute the support of itemsets fast. Recent years, BitTable (Dong & Han, 2007) and IndexBitTable (Song, Yang, & Xu, 2008) approaches have been applied for mining frequent itemsets and results are significant. However, they always use a fixed size of Bit-Vector for each item (equal to number of transactions in a database). It leads to consume more memory for storage Bit-Vectors and the time for computing the intersection among Bit-Vectors. Besides, they only apply for mining frequent itemsets, algorithm for mining FCI based on BitTable is not proposed. This paper introduces a new method for mining FCI from transaction databases. Firstly, Dynamic Bit-Vector (DBV) approach will be presented and algorithms for fast computing the intersection between two DBVs are also proposed. Lookup table is used for fast computing the support (number of bits 1 in a DBV) of itemsets. Next, subsumption concept for memory and computing time saving will be discussed. Finally, an algorithm based on DBV and subsumption concept for mining frequent closed itemsets fast is proposed. We compare our method with CHARM, and recognize that the proposed algorithm is more efficient than CHARM in both the mining time and the memory usage.