A persistent HY-Tree to efficiently support itemset mining on large datasets

Authors:
Elena Baralis;Tania Cerquitelli;Silvia Chiusano
Affiliations:
Politecnico di Torino, Torino, Italy;Politecnico di Torino, Torino, Italy;Politecnico di Torino, Torino, Italy
Venue:
Proceedings of the 2010 ACM Symposium on Applied Computing
Year:
2010

Citing 10
Cited 1

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Database Mining: A Performance Perspective

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
COFI approach for mining frequent itemsets revisited

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining Frequent Itemsets from Secondary Memory

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Out-of-core frequent pattern mining on a commodity PC

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
DRFP-tree: disk-resident frequent pattern tree

Applied Intelligence

Frequent itemset mining of uncertain data streams using the damped window model

Proceedings of the 2011 ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the HY-Tree persistent tree structure that provides a compact representation of a transactional dataset for frequent itemset mining. The HY-Tree is characterized by a hybrid structure that easily adapts to different data distributions. The data representation is complete, since no support threshold is enforced during the HY-Tree creation process. The HY-Tree can be profitably exploited by a variety of itemset mining algorithms (e.g., LCM v.2, nonordFP). It effectively supports the data retrieval step in the itemset mining process by reducing both the I/O cost and the memory requirements for data loading. Experiments on large synthetic datasets show the compactness of the HY-Tree data representation and the efficiency and scalability on large datasets of the mining algorithms supported by it.