Real world performance of association rule algorithms

  • Authors:
  • Zijian Zheng;Ron Kohavi;Llew Mason

  • Affiliations:
  • Blue Martini Software, San Mateo, CA;Blue Martini Software, San Mateo, CA;Blue Martini Software, San Mateo, CA

  • Venue:
  • Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM artificial dataset. More importantly, we found that the choice of algorithm only matters at support levels that generate more rules than would be useful in practice. For support levels that generate less than 1,000,000 rules, which is much more than humans can handle and is sufficient for prediction purposes where data is loaded into RAM, Apriori finishes processing in less than 10 minutes. On our datasets, we observed super-exponential growth in the number of rules. On one of our datasets, a 0.02% change in the support increased the number of rules from less than a million to over a billion, implying that outside a very narrow range of support values, the choice of algorithm is irrelevant.