Discovering frequent itemsets in the presence of highly frequent items

  • Authors:
  • Dennis P. Groth;Edward L. Robertson

  • Affiliations:
  • School of Informatics, Indiana University, Bloomington, IN;Computer Science, Indiana University, Bloomington, IN

  • Venue:
  • INAP'01 Proceedings of the Applications of prolog 14th international conference on Web knowledge management and decision support
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents new techniques for focusing the discovery of frequent itemsets within large, dense datasets containing highly frequent items. The existence of highly frequent items adds significantly to the cost of computing the complete set of frequent itemsets. Our approach allows for the exclusion of such items during the candidate generation phase of the Apriori algorithm. Afterwards, the highly frequent items can be reintroduced, via an inferencing framework, providing for a capability to generate frequent itemsets without counting their frequency. We demonstrate the use of these new techniques within the well-studied framework of the Apriori algorithm. Furthermore, we provide empirical results using our techniques on both synthetic and real datasets - both relevant since the real datasets exhibit statistical characteristics different from the probabilistic assumptions behind the synthetic data. The source we used for real data was the U.S. Census.