Distributed data mining in a chain store database of short transactions

Authors:
Cheng-Ru Lin;Chang-Hung Lee;Ming-Syan Chen;Philip S. Yu
Affiliations:
National Taiwan University, Taipei, Taiwan, ROC;National Taiwan University, Taipei, Taiwan, ROC;National Taiwan University, Taipei, Taiwan, ROC;IBM T. J. Waston Research Center, Yorktown NY
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 15
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Discovering unexpected information from your competitors' web sites

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient discovery of error-tolerant frequent itemsets in high dimensions

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Sliding-window filtering: an efficient algorithm for incremental mining

Proceedings of the tenth international conference on Information and knowledge management
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Efficient Data Mining for Path Traversal Patterns

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
On Mining General Temporal Association Rules in a Publication Database

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Building Hierarchical Classifiers Using Class Proximity

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases

Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

Expert Systems with Applications: An International Journal
Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Load balancing approach parallel algorithm for frequent pattern mining

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we broaden the horizon of traditional rule mining by introducing a new framework of causality rule mining in a distributed chain store database. Specifically, the causality rule explored in this paper consists of a sequence of triggering events and a set of consequential events, and is designed with the capability of mining non-sequential, inter-transaction information. Hence, the causality rule mining provides a very general framework for rule derivation. Note, however, that the procedure of causality rule mining is very costly particularly in the presence of a huge number of candidate sets and a distributed database, and in our opinion, cannot be dealt with by direct extensions from existing rule mining methods. Consequently, we devise in this paper a series of level matching algorithms, including Level Matching (abbreviatedly as LM), Level Matching with Selective Scan (abbreviatedly as LMS), and Distributed Level Matching (abbreviatedly as Distibuted LM), to minimize the computing cost needed for the distributed data mining of causality rules. In addition, the phenomena of time window constraints are also taken into consideration for the development of our algorithms. As a result of properly employing the technologies of level matching and selective scan, the proposed algorithms present good efficiency and scalability in the mining of local and global causality rules. Scale-up experiments show that the proposed algorithms scale well with the number of sites and the number of customer transactions.Index Terms: knowledge discovery, distributed data mining causality rules, triggering events, consequential events