Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules

Authors:
S. D. Lee;David W. Cheung;Ben Kao
Affiliations:
Department of Computer Science, The University of Hong Kong, Hong Kong. sdlee@cs.hku.hk;Department of Computer Science, The University of Hong Kong, Hong Kong. dcheung@cs.hku.hk;Department of Computer Science, The University of Hong Kong, Hong Kong. kao@cs.hku.hk
Venue:
Data Mining and Knowledge Discovery
Year:
1998

Citing 13
Cited 23

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Finding interesting rules from large sets of discovered association rules

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Efficient parallel data mining for association rules

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Probability and Statistics with Reliability, Queuing and Computer Science Applications

Probability and Statistics with Reliability, Queuing and Computer Science Applications
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery of Multiple-Level Association Rules from Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A General Incremental Technique for Maintaining Discovered Association Rules

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)

An efficient and effective algorithm for density biased sampling

Proceedings of the eleventh international conference on Information and knowledge management
Association Rules

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
A Low-Scan Incremental Association Rule Maintenance Method Based on the Apriori Property

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Data mining tasks and methods: parallel methods for scaling data mining algorithms to large data sets

Handbook of data mining and knowledge discovery
Incremental mining of sequential patterns in large databases

Data & Knowledge Engineering
Maintaining discovered frequent itemsets: cases for changeable database and support

Journal of Computer Science and Technology
Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences

Data Mining and Knowledge Discovery
Indexed-based density biased sampling for clustering applications

Data & Knowledge Engineering
Post Data Mining Analysis for Decision Support through Econometrics

Information-Knowledge-Systems Management
Quality-Aware Sampling and Its Applications in Incremental Data Mining

IEEE Transactions on Knowledge and Data Engineering
A bottom-up projection based algorithm for mining high utility itemsets

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Power-law relationship and self-similarity in the itemset support distribution: analysis and applications

The VLDB Journal — The International Journal on Very Large Data Bases
An approach to online optimization of heuristic coordination algorithms

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
Higher order mining

ACM SIGKDD Explorations Newsletter
A new sampling technique for association rule mining

Journal of Information Science
A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
RMAIN: Association rules maintenance without reruns through data

Information Sciences: an International Journal
Frequent subgraph mining on a single large graph using sampling techniques

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Discovering process models with genetic algorithms using sampling

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Discovery of frequent patterns in transactional data streams

Transactions on large-scale data- and knowledge-centered systems II
Temporal evolution and local patterns

LPD'04 Proceedings of the 2004 international conference on Local Pattern Detection
Adaptive stratified reservoir sampling over heterogeneous data streams

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

By nature, sampling is an appealing technique for datamining, because approximate solutions in most cases may alreadybe of great satisfaction to the need of the users. We attempt touse sampling techniques to address the problem of maintainingdiscovered association rules. Some studies have been done on theproblem of maintaining the discovered association rules whenupdates are made to the database. All proposed methods mustexamine not only the changed part but also the unchanged part inthe original database, which is very large, and hence take muchtime. Worse yet, if the updates on the rules are performedfrequently on the database but the underlying rule set has notchanged much, then the effort could be mostly wasted. In thispaper, we devise an algorithm which employs sampling techniquesto estimate the difference between the association rules in adatabase before and after the database is updated. The estimateddifference can be used to determine whether we should update themined association rules or not. If the estimated difference issmall, then the rules in the original database is still a goodapproximation to those in the updated database. Hence, we do nothave to spend the resources to update the rules. We canaccumulate more updates before actually updating the rules,thereby avoiding the overheads of updating the rules toofrequently. Experimental results show that our algorithm is veryefficient and highly accurate.