Real world performance of association rule algorithms

Authors:
Zijian Zheng;Ron Kohavi;Llew Mason
Affiliations:
Blue Martini Software, San Mateo, CA;Blue Martini Software, San Mateo, CA;Blue Martini Software, San Mateo, CA
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 8
Cited 131

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Fast discovery of association rules

Advances in knowledge discovery and data mining
Pruning and summarizing the discovered associations

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Generating non-redundant association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient search for association rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
KDD-Cup 2000 organizers' report: peeling the onion

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
OPUS: an efficient admissible algorithm for unordered search

Journal of Artificial Intelligence Research

Comparison of interestingness functions for learning web usage patterns

Proceedings of the eleventh international conference on Information and knowledge management
Mining All Non-derivable Frequent Itemsets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
On the Efficiency of Association-Rule Mining Algorithms

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Dataset Filtering Techniques in Constraint-Based Frequent Pattern Mining

Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery
Real World Association Rule Mining

BNCOD 19 Proceedings of the 19th British National Conference on Databases: Advances in Databases
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent item sets by opportunistic projection

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
Processing frequent itemset discovery queries by division and set containment join operators

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Direct Interesting Rule Generation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
CLOSET+: searching for the best strategies for mining frequent closed itemsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
On computing, storing and querying frequent patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Memory issues in frequent itemset mining

Proceedings of the 2004 ACM symposium on Applied computing
Advances in frequent itemset mining implementations: report on FIMI'03

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Lessons and Challenges from Mining Retail E-Commerce Data

Machine Learning
Mining Frequent Itemsets without Support Threshold: With and without Item Constraints

IEEE Transactions on Knowledge and Data Engineering
Scrutinizing Frequent Pattern Discovery Performance

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Agents and Stream Data Mining: A New Perspective

IEEE Intelligent Systems
Tight upper bounds on the number of candidate patterns

ACM Transactions on Database Systems (TODS)
Cache-conscious frequent pattern mining on a modern processor

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Algorithm for In-Core Frequent Itemset Mining on Streaming Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Distribution-based aggregation for relational learning with identifier attributes

Machine Learning
Perfect hashing schemes for mining traversal patterns

Fundamenta Informaticae
A probability analysis for candidate-based frequent itemset algorithms

Proceedings of the 2006 ACM symposium on Applied computing
Frequent closed itemset based algorithms: a thorough structural and analytical survey

ACM SIGKDD Explorations Newsletter
DSM-PLW: single-pass mining of path traversal patterns over streaming web click-sequences

Computer Networks: The International Journal of Computer and Telecommunications Networking - Web dynamics
Discovering significant rules

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
An efficient approach to mining indirect associations

Journal of Intelligent Information Systems
Extending the single words-based document model: a comparison of bigrams and 2-itemsets

Proceedings of the 2006 ACM symposium on Document engineering
Matrix apriori: speeding up the search for frequent patterns

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Cache-conscious frequent pattern mining on modern and emerging processors

The VLDB Journal — The International Journal on Very Large Data Bases
Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism

Decision Support Systems
CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

Information Systems
Dare to share: Protecting sensitive knowledge with data sanitization

Decision Support Systems
Association rules mining using heavy itemsets

Data & Knowledge Engineering
Privacy preserving itemset mining through fake transactions

Proceedings of the 2007 ACM symposium on Applied computing
Maximizing Accuracy of Shared Databases when Concealing Sensitive Patterns

Information Systems Research
Discovering Significant Patterns

Machine Learning
Twain: Two-end association miner with precise frequent exhibition periods

ACM Transactions on Knowledge Discovery from Data (TKDD)
MICF: An effective sanitization algorithm for hiding sensitive patterns on data mining

Advanced Engineering Informatics
Maintaining data privacy in association rule mining

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Preserving privacy in association rule mining with bloom filters

Journal of Intelligent Information Systems
Incrementally fast updated frequent pattern trees

Expert Systems with Applications: An International Journal
A unified framework for protecting sensitive association rules in business collaboration

International Journal of Business Intelligence and Data Mining
Statistical mining of interesting association rules

Statistics and Computing
SPICE: A New Framework for Data Mining based on Probability Logic and Formal Concept Analysis

Fundamenta Informaticae - Special issue ISMIS'05
A Contribution to the Use of Decision Diagrams for Loading and Mining Transaction Databases

Fundamenta Informaticae - Special issue ISMIS'05
Privacy Preserving Database Generation for Database Application Testing

Fundamenta Informaticae - Special issue ISMIS'05
Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Machine Learning
Maintenance of the prelarge trees for record deletion

MATH'07 Proceedings of the 12th WSEAS International Conference on Applied Mathematics
Mining top-k frequent patterns in the presence of the memory constraint

The VLDB Journal — The International Journal on Very Large Data Bases
Power-law relationship and self-similarity in the itemset support distribution: analysis and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Extended QFD and data-mining-based methods for supplier selection in mass customization

International Journal of Computer Integrated Manufacturing - Networked Manufacturing and Mass Customization in the ECommerce Era: the Chinese Perspective
Realistic Synthetic Data for Testing Association Rule Mining Algorithms for Market Basket Databases

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Incremental Mining with Prelarge Trees

IEA/AIE '08 Proceedings of the 21st international conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: New Frontiers in Applied Artificial Intelligence
TARtool: A Temporal Dataset Generator for Market Basket Analysis

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Efficient Approximate Mining of Frequent Patterns over Transactional Data Streams

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A new concise representation of frequent itemsets using generators and a positive border

Knowledge and Information Systems
Protecting business intelligence and customer privacy while outsourcing data mining tasks

Knowledge and Information Systems
Privacy-preserving anonymization of set-valued data

Proceedings of the VLDB Endowment
Privacy preserving itemset mining through noisy items

Expert Systems with Applications: An International Journal
An architecture for making recommendations to courseware authors using association rule mining and collaborative filtering

User Modeling and User-Adapted Interaction
The Pre-FUFP algorithm for incremental mining

Expert Systems with Applications: An International Journal
Maintenance of fast updated frequent pattern trees for record deletion

Computational Statistics & Data Analysis
On pushing weight constraints deeply into frequent itemset mining

Intelligent Data Analysis
Evaluating Web Based Instructional Models Using Association Rule Mining

UMAP '09 Proceedings of the 17th International Conference on User Modeling, Adaptation, and Personalization: formerly UM and AH
Exploring ant-based algorithms for gene expression data analysis

Artificial Intelligence in Medicine
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
Diverging patterns: discovering significant frequency change dissimilarities in large databases

Proceedings of the 18th ACM conference on Information and knowledge management
Anonymization of set-valued data via top-down, local generalization

Proceedings of the VLDB Endowment
Efficient mining of utility-based web path traversal patterns

ICACT'09 Proceedings of the 11th international conference on Advanced Communication Technology - Volume 3
Hiding collaborative recommendation association rules on horizontally partitioned data

Intelligent Data Analysis
Complexity analysis of depth first and FP-growth implementations of APRIORI

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Maintenance of fast updated frequent trees for record deletion based on prelarge concepts

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Mining frequent itemsets in large data warehouses: a novel approach proposed for sparse data sets

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Cosmetics purchasing behavior - An analysis using association reasoning neural networks

Expert Systems with Applications: An International Journal
BAR: bitmap-based association rule: an implementation and its optimizations

Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia
Anonymizing transaction data to eliminate sensitive inferences

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Mining closed itemsets in data stream using formal concept analysis

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
I/O conscious algorithm design and systems support for data analysis on emerging architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ρ-uncertainty: inference-proof transaction anonymization

Proceedings of the VLDB Endowment
Local and global recoding methods for anonymizing set-valued data

The VLDB Journal — The International Journal on Very Large Data Bases
Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms

Proceedings of the 14th International Conference on Extending Database Technology
Coverage patterns for efficient banner advertisement placement

Proceedings of the 20th international conference companion on World wide web
PCTA: privacy-constrained clustering-based transaction data anonymization

Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society
Fastest association rule mining algorithm predictor (FARM-AP)

Proceedings of The Fourth International C* Conference on Computer Science and Software Engineering
SPO-Tree: efficient single pass ordered incremental pattern mining

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Intelligent Green IT Management for Enterprises through System Profiling

GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
CGLive - A Real Time Power Monitoring Solution for Enterprises

GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
Visualizing the construction of incremental disorder Trie Itemset data structure (DOSTrieIT) for frequent pattern tree (FP-tree)

IVIC'11 Proceedings of the Second international conference on Visual informatics: sustaining research and innovations - Volume Part I
Summarizing frequent patterns using profiles

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Parallel mining of top-k frequent itemsets in very large text database

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
VSOP (valued-sum-of-products) calculator for knowledge processing based on zero-suppressed BDDs

Proceedings of the 2005 international conference on Federation over the Web
On exploring the power-law relationship in the itemset support distribution

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Finding all frequent patterns starting from the closure

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Mining global association rules on an oracle grid by scanning once distributed databases

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Simulating the effectiveness of using association rules for recommendation systems

AsiaSim'04 Proceedings of the Third Asian simulation conference on Systems Modeling and Simulation: theory and applications
SARM — succinct association rule mining: an approach to enhance association mining

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Estimation of the density of datasets with decision diagrams

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
A top down algorithm for mining web access patterns from web logs

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Using prefix-trees for efficiently computing set joins

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Flexible online association rule mining based on multidimensional pattern relations

Information Sciences: an International Journal
Using rules discovery for the continuous improvement of e-learning courses

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Anonymizing transaction data by integrating suppression and generalization

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Utility-preserving transaction data anonymization with low information loss

Expert Systems with Applications: An International Journal
Quick inclusion-exclusion

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Incremental update on probabilistic frequent itemsets in uncertain databases

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Apriori-based frequent itemset mining algorithms on MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Utility-guided Clustering-based Transaction Data Anonymization

Transactions on Data Privacy
Temporal data mining with up-to-date pattern trees

Expert Systems with Applications: An International Journal
Interactive mining of high utility patterns over data streams

Expert Systems with Applications: An International Journal
Privacy preservation by disassociation

Proceedings of the VLDB Endowment
An adaptive approach to mining frequent itemsets efficiently

Expert Systems with Applications: An International Journal
Cover similarity based item set mining

Bisociative Knowledge Discovery
Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model

Computers & Mathematics with Applications
SPICE: A New Framework for Data Mining based on Probability Logic and Formal Concept Analysis

Fundamenta Informaticae - Special issue ISMIS'05
A Contribution to the Use of Decision Diagrams for Loading and Mining Transaction Databases

Fundamenta Informaticae - Special issue ISMIS'05
Privacy Preserving Database Generation for Database Application Testing

Fundamenta Informaticae - Special issue ISMIS'05
Perfect Hashing Schemes for Mining Traversal Patterns

Fundamenta Informaticae
Mining event logs to support workflow resource allocation

Knowledge-Based Systems
Scalable technique to discover items support from trie data structure

ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
On differentially private frequent itemset mining

Proceedings of the VLDB Endowment
FAR-miner: a fast and efficient algorithm for fuzzy association rule mining

International Journal of Business Intelligence and Data Mining
pcApriori: scalable apriori for multiprocessor systems

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Efficient mining of frequent itemsets in social network data based on MapReduce framework

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Using TF-IDF to hide sensitive itemsets

Applied Intelligence
Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs

ACM Transactions on Knowledge Discovery from Data (TKDD)
Key roles of closed sets and minimal generators in concise representations of frequent patterns

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM artificial dataset. More importantly, we found that the choice of algorithm only matters at support levels that generate more rules than would be useful in practice. For support levels that generate less than 1,000,000 rules, which is much more than humans can handle and is sufficient for prediction purposes where data is loaded into RAM, Apriori finishes processing in less than 10 minutes. On our datasets, we observed super-exponential growth in the number of rules. On one of our datasets, a 0.02% change in the support increased the number of rules from less than a million to over a billion, implying that outside a very narrow range of support values, the choice of algorithm is irrelevant.