Communications of the ACM
An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A tree projection algorithm for generation of frequent item sets
Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Hash based parallel algorithms for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
A fast distributed algorithm for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
JavaSpaces Principles, Patterns, and Practice
JavaSpaces Principles, Patterns, and Practice
Data Mining Techniques: For Marketing, Sales, and Customer Support
Data Mining Techniques: For Marketing, Sales, and Customer Support
Effect of Data Distribution in Parallel Mining of Associations
Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Efficient Mining of Association Rules in Distributed Databases
IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Computing Association Rules Using Partial Totals
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Dynamic Load Balancing for Parallel Association Rule Mining on Heterogenous PC Cluster Systems
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mining of Association Rules in Very Large Databases: A Structured Parallel Approach
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Selecting the right interestingness measure for association patterns
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
T-Trees, Vertical Partitioning and Distributed Association Rule Mining
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Tree Structures for Mining Association Rules
Data Mining and Knowledge Discovery
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
In this paper a number of alternative strategies for distributed/parallel association rule mining are investigated. The methods examined make use of a data structure, the T-tree, introduced previously by the authors as a structure for organizing sets of attributes for which support is being counted. We consider six different approaches, representing different ways of parallelizing the basic Apriori-T algorithm that we use. The methods focus on different mechanisms for partitioning the data between processes, and for reducing the message-passing overhead. Both ‘horizontal’ (data distribution) and ‘vertical’ (candidate distribution) partitioning strategies are considered, including a vertical partitioning algorithm (DATA-VP) which we have developed to exploit the structure of the T-tree. We present experimental results examining the performance of the methods in implementations using JavaSpaces. We conclude that in a JavaSpaces environment, candidate distribution strategies offer better performance than those that distribute the original dataset, because of the lower messaging overhead, and the DATA-VP algorithm produced results that are especially encouraging.