Algorithms for mining frequent itemsets in static and dynamic datasets

  • Authors:
  • R. Hernández-León;J. Hernández-Palancar;Jesús A. Carrasco-Ochoa;José Fco. Martínez-Trinidad

  • Affiliations:
  • (Correspd. E-mail: raudel@ccc.inaoep.mx) Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico and Advanced Technologies Application Center, Havan ...;Advanced Technologies Application Center, Havana, Cuba;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, two algorithms for mining frequent itemsets in large sparse datasets are proposed. The first one, named Compressed Arrays (CA), allows to process datasets that do not change along the time (static datasets) while the second one, based on the ideas of the former and named Dynamic Compressed Arrays (DCA), processes datasets that change along the time by adding/deleting transactions (dynamic datasets). Both algorithms introduce a novel way to use equivalence classes of itemsets by performing a breadth first search through them and by storing the class prefix support in compressed arrays, which allows fast itemset support computing. On the other hand, unlike previous algorithms for dynamic datasets that store the full dataset in main memory without reusing the current frequent itemsets, DCA algorithm stores the current frequent itemsets in binary files, grouped in equivalence classes, and reuses them to calculate the new frequent itemsets.