BISC: A bitmap itemset support counting approach for efficient frequent itemset mining

Authors:
Jinlin Chen;Keli Xiao
Affiliations:
Queens College, City University of New York, Flushing, NY;Rutgers, The State University of New Jersey, Newark, NJ
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2010

Citing 17
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Enhancing the Apriori Algorithm for Frequent Set Counting

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Top Down FP-Growth for Association Rule Mining

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast Algorithms for Frequent Itemset Mining Using FP-Trees

IEEE Transactions on Knowledge and Data Engineering
A Transaction Mapping Algorithm for Frequent Itemsets Mining

IEEE Transactions on Knowledge and Data Engineering
Association mining

ACM Computing Surveys (CSUR)
On benchmarking frequent itemset mining algorithms: from measurement to analysis

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
An efficient algorithm for enumerating pseudo cliques

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation

A two stage approach for contiguous sequential pattern mining

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of a depth-first frequent itemset (FI) miming algorithm is closely related to the total number of recursions. In previous approaches this is mainly decided by the total number of FIs, which results in poor performance when a large number of FIs are involved. To solve this problem, a three-strategy adaptive algorithm, bitmap itemset support counting (BISC), is presented. The core strategy, BISC1, is used in the innermost steps of the recursion. For a database D with only s frequent items, a depth-first approach need up to s levels of recursions to detect all the FIs (up to 2s). BISC1 completely replaces these recursions with a special summation that directly calculates the supports of all the possible 2s candidate itemsets. With BISC1 the run-time is entirely independent of the database after one database scan, and the per-candidate cost is only s. To offset the exponential growth of cost (both time and space) with BISC1 as s increases, a second strategy, BISC2, is introduced to effectively double the acceptable range of s. BISC2 divides an itemset into prefix and suffix and improves the performance by pruning all the itemsets with infrequent prefixes. If the total number of frequent items in D is high, the classic database projection strategy is used. In this case for the first s items a single run of BISC (1 or 2) is applied. For each of the remaining items, a projected database is created and the mining process proceeds recursively. To achieve optimal performance, BISC adaptively decides which strategy to use based on the dataset and minimum support. Experiments show that BISC outperforms previous approaches in all the datasets tested. Even though this does not guarantee that BISC will always perform the best, the result is impressive given the fact that most existing algorithms are only efficient in some types of datasets. The memory usage of BISC is also comparable to those of other algorithms.