Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Online association rule mining
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Depth first generation of long patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Techniques for Online Exploration of Large Object-Relational Datasets
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Evaluation of Sampling for Data Mining of Association Rules
Evaluation of Sampling for Data Mining of Association Rules
Efficient data reduction with EASE
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical properties of transactional databases
Proceedings of the 2004 ACM symposium on Applied computing
ACM Computing Surveys (CSUR)
Multi-scaling sampling: an adaptive sampling method for discovering approximate association rules
Journal of Computer Science and Technology
Efficient sampling of training set in large and noisy multimedia data
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A new deterministic data aggregation method for wireless sensor networks
Signal Processing
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Deterministic algorithms for sampling count data
Data & Knowledge Engineering
Feature-preserved sampling over streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Analysis of sampling techniques for association rule mining
Proceedings of the 12th International Conference on Database Theory
A new sampling technique for association rule mining
Journal of Information Science
Estimating the confidence of conditional functional dependencies
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Efficient Frequent Itemsets Mining by Sampling
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Journal of Data and Information Quality (JDIQ)
Proportional fault-tolerant data mining with applications to bioinformatics
Information Systems Frontiers
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Output space sampling for graph patterns
Proceedings of the VLDB Endowment
extraRelief: improving relief by efficient selection of instances
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Mining top-K frequent itemsets through progressive sampling
Data Mining and Knowledge Discovery
A new approach for generating efficient sample from market basket data
Expert Systems with Applications: An International Journal
A sampling based algorithm for finding association rules from uncertain data
AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams
Transactions on large-scale data- and knowledge-centered systems II
Locality sensitive hashing for sampling-based algorithms in association rule mining
Expert Systems with Applications: An International Journal
Parallel mining of maximal sequential patterns using multiple samples
The Journal of Supercomputing
An agent for the HCARD model in the distributed environment
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
The k-means clustering architecture in the multi-stage data mining process
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Discovering patterns based on fuzzy logic theory
ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part IV
Association rule discovery in data mining by implementing principal component analysis
AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
Sampling ensembles for frequent patterns
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Efficient sampling: application to image data
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Effective sampling for mining association rules
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Memory-aware frequent k-itemset mining
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Early accurate results for advanced analytics on MapReduce
Proceedings of the VLDB Endowment
Stratified k-means clustering over a deep web data source
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
ML-DS: a novel deterministic sampling algorithm for association rules mining
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Hi-index | 0.00 |
This paper introduces FAST, a novel two-phase sampling-based algorithm for discovering association rules in large databases. In Phase I a large initial sample of transactions is collected and used to quickly and accurately estimate the support of each individual item in the database. In Phase II these estimated supports are used to either trim "outlier" transactions or select "representative" transactions from the initial sample, thereby forming a small final sample that more accurately reflects the statistical characteristics (i.e., itemset supports) of the entire database. The expensive operation of discovering association rules is then performed on the final sample. In an empirical study, FAST was able to achieve 90--95% accuracy using a final sample having a size of only 15--33% of that of a comparable random sample. This efficiency gain resulted in a speedup by roughly a factor of 10 over previous algorithms that require expensive processing of the entire database --- even efficient algorithms that exploit sampling. Our new sampling technique can be used in conjunction with almost any standard association-rule algorithm, and can potentially render scalable other algorithms that mine "count" data.