An abstraction based communication efficient distributed association rule mining

Authors:
P. Santhi Thilagam;V. S. Ananthanarayana
Affiliations:
Dept. of Computer Engineering, NITK, Surathkal, India;Dept. of Information Technology, NITK, Surathkal, India
Venue:
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Year:
2008

Citing 8
Cited 0

Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Fast Parallel Association Rule Mining without Candidacy Generation

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A High-Performance Distributed Algorithm for Mining Association Rules

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Communication-Efficient Distributed Mining of Association Rules

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Association rule mining is one of the most researched areas because of its applicability in various fields. We propose a novel data structure called Sequence Pattern Count, SPC, tree which stores the database compactly and completely and requires only one scan of the database for its construction. The completeness property of the SPC tree with respect to the database makes it more suitable for mining association rules in the context of changing data and changing supports without rebuilding the tree. A performance study shows that SPC tree is efficient and scalable. We also propose a Doubly Logarithmic-depth Tree, DLT, algorithm which uses SPC tree to efficiently mine the huge amounts of geographically distributed datasets in order to minimize the communication and computation costs. DLT requires only O(n) messages for support count exchange and it takes only O(log log n) time for exchange of messages, which increases its efficiency.