Evaluation of sampling for data mining of association rules

Authors:
M. J. Zaki;S. Parthasarathy;Wei Li;M. Ogihara
Affiliations:
-;-;-;-
Venue:
RIDE '97 Proceedings of the 7th International Workshop on Research Issues in Data Engineering (RIDE '97) High Performance Database Management for Large-Scale Applications
Year:
1997

Citing 0
Cited 53

A localized algorithm for parallel association mining

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
An efficient and effective algorithm for density biased sampling

Proceedings of the eleventh international conference on Information and knowledge management
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Parallel GA-Based Wrapper Feature Selection for Spectroscopic Data Mining

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Towards Network-Aware Data Mining

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Clustering Distributed Homogeneous Datasets

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Enhancing the Apriori Algorithm for Frequent Set Counting

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Data Reduction via Conflicting Data Analysis

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Scheduling High Performance Data Mining Tasks on a Data Grid Environment

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Formal Logics of Discovery and Hypothesis Formation by Machine

DS '98 Proceedings of the First International Conference on Discovery Science
Parallel and Distributed Data Mining: An Introduction

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Optimized Disjunctive Association Rules via Sampling

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Statistical properties of transactional databases

Proceedings of the 2004 ACM symposium on Applied computing
Iterative Projected Clustering by Subspace Mining

IEEE Transactions on Knowledge and Data Engineering
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Elastic Translation Invariant Matching of Trajectories

Machine Learning
Indexed-based density biased sampling for clustering applications

Data & Knowledge Engineering
Multi-scaling sampling: an adaptive sampling method for discovering approximate association rules

Journal of Computer Science and Technology
Quality-Aware Sampling and Its Applications in Incremental Data Mining

IEEE Transactions on Knowledge and Data Engineering
Power-law relationship and self-similarity in the itemset support distribution: analysis and applications

The VLDB Journal — The International Journal on Very Large Data Bases
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Feature-preserved sampling over streaming data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Identifying appropriate methodologies and strategies for vertical mining with incomplete data

WSEAS Transactions on Computers
Analysis of sampling techniques for association rule mining

Proceedings of the 12th International Conference on Database Theory
A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Vertical mining with incomplete data

MAMECTIS'08 Proceedings of the 10th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Efficient Frequent Itemsets Mining by Sampling

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Mining in Large Noisy Domains

Journal of Data and Information Quality (JDIQ)
Which Is Better for Frequent Pattern Mining: Approximate Counting or Sampling?

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
A test paradigm for detecting changes in transactional data streams

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Focusing solutions for data mining: analytical studies and experimental results in real-world domains

Focusing solutions for data mining: analytical studies and experimental results in real-world domains
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Mining top-K frequent itemsets through progressive sampling

Data Mining and Knowledge Discovery
A comparison between approximate counting and sampling methods for frequent pattern mining on data streams

Intelligent Data Analysis
A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets

Intelligent Data Analysis - Ubiquitous Knowledge Discovery
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Locality sensitive hashing for sampling-based algorithms in association rule mining

Expert Systems with Applications: An International Journal
Direct local pattern sampling by efficient two-step random procedures

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Parallel mining of maximal sequential patterns using multiple samples

The Journal of Supercomputing
On exploring the power-law relationship in the itemset support distribution

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Sampling ensembles for frequent patterns

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Progressive sampling for association rules based on sampling error estimation

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Effective sampling for mining association rules

AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
ML-DS: a novel deterministic sampling algorithm for association rules mining

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce

Proceedings of the 21st ACM international conference on Information and knowledge management
Efficient discovery of association rules and frequent itemsets through sampling with tight performance guarantees

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
A new parallel association rule mining algorithm on distributed shared memory system

International Journal of Business Intelligence and Data Mining
Crowd mining

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
CrowdMiner: mining association rules from the crowd

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The discovery of association rules is a prototypical problem in data mining. The current algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring item sets (or set of items). For large databases, the I/O overhead in scanning the database can be extremely high. The authors show that random sampling of transactions in the database is an effective method for finding association rules. Sampling can speed up the mining process by more than an order of magnitude by reducing I/O costs and drastically shrinking the number of transactions to be considered. They may also be able to make the sampled database resident in main-memory. Furthermore, they show that sampling can accurately represent the data patterns in the database with high confidence. They experimentally evaluate the effectiveness of sampling on different databases, and study the relationship between the performance, accuracy, and confidence of the chosen sample.