A persistent HY-Tree to efficiently support itemset mining on large datasets

  • Authors:
  • Elena Baralis;Tania Cerquitelli;Silvia Chiusano

  • Affiliations:
  • Politecnico di Torino, Torino, Italy;Politecnico di Torino, Torino, Italy;Politecnico di Torino, Torino, Italy

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents the HY-Tree persistent tree structure that provides a compact representation of a transactional dataset for frequent itemset mining. The HY-Tree is characterized by a hybrid structure that easily adapts to different data distributions. The data representation is complete, since no support threshold is enforced during the HY-Tree creation process. The HY-Tree can be profitably exploited by a variety of itemset mining algorithms (e.g., LCM v.2, nonordFP). It effectively supports the data retrieval step in the itemset mining process by reducing both the I/O cost and the memory requirements for data loading. Experiments on large synthetic datasets show the compactness of the HY-Tree data representation and the efficiency and scalability on large datasets of the mining algorithms supported by it.