Discovering associations in very large databases by approximating

Authors:
Shichao Zhang;Chengqi Zhang
Affiliations:
Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway NSW 2007, Australia and Guangxi Teachers University, Gulin, P R China;Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway NSW 2007, Australia
Venue:
Acta Cybernetica
Year:
2003

Citing 11
Cited 0

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Parallel mining algorithms for generalized association rules with classification hierarchy

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining generalized association rules

Future Generation Computer Systems - Special double issue on data mining
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient Mining of Association Rules in Large Dynamic Databases

BNCOD 16 Proceedings of the 16th British National Conferenc on Databases: Advances in Databases
Anytime mining for multiuser applications

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining association rules has posed great challenge to the research community. Despite efforts in designing fast and efficient mining algorithms, it remains a time consuming process for very large databases. In this paper, we adopt a slightly different approach to this problem, which can mine approximate association rules quickly. By considering the database as a set of records that are randomly appended, we can apply the central limit theorem to estimate the size of a random subset of the database, and discover both positive and negative association rules by generating all possible useful itemsets from the random subset. However, because of approximation errors, it is possible for some valid rules to be missed, while other invalid rules may be generated. To deal with this problem, we adopt a two phase approach. First, we discover all promising approximate rules from a random sample of the database. Second, these approximate results are used as heuristic information in an efficient algorithm that requires only one-pass of the database to validate rules that have support and confidence close to the desired support and confidence values. We evaluated the proposed technique, and our experimental results demonstrate that the approach is efficient and promising.