Toward boosting distributed association rule mining by data de-clustering

Authors:
Frank S. C. Tseng;Yen-Hung Kuo;Yueh-Min Huang
Affiliations:
Department of Information Management, National Kaohsiung First University of Science and Technology, No. 1, University Road, YenChao, 824 Kaohsiung County, Taiwan, ROC;Innovative DigiTech-Enabled Applications and Services Institute, Institute for Information Industry, 8F., No. 133, Sec. 4, Minsheng East Road, Taipei City 105, Taiwan, ROC;Department of Engineering Science, National Cheng Kung University, No. 1, University Road, Tainan 701, Taiwan, ROC
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 46
Cited 3

A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Modern Information Retrieval

Modern Information Retrieval
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Combinatorial Algorithms: For Computers and Hard Calculators

Combinatorial Algorithms: For Computers and Hard Calculators
Effect of Data Distribution in Parallel Mining of Associations

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Feature Selection via Discretization

IEEE Transactions on Knowledge and Data Engineering
A New Approach to Online Generation of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Mining Associations with the Collective Strength Approach

IEEE Transactions on Knowledge and Data Engineering
Finding Localized Associations in Market Basket Data

IEEE Transactions on Knowledge and Data Engineering
Redefining Clustering for High-Dimensional Applications

IEEE Transactions on Knowledge and Data Engineering
A study of object declustering strategies in parallel temporal object database systems

Information Sciences—Applications: An International Journal
The Idea of De-Clustering and its Applications

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences

IEEE Transactions on Knowledge and Data Engineering
Some complexity results for the Traveling Salesman Problem

STOC '76 Proceedings of the eighth annual ACM symposium on Theory of computing
(Almost) Optimal parallel block access for range queries

Information Sciences—Informatics and Computer Science: An International Journal
From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries

Journal of the ACM (JACM)
Communication-Efficient Distributed Mining of Association Rules

Data Mining and Knowledge Discovery
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
A high-performance distributed algorithm for mining association rules

Knowledge and Information Systems
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Automatic complex schema matching across Web query interfaces: A correlation mining approach

ACM Transactions on Database Systems (TODS)
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Distributed Data Mining in Peer-to-Peer Networks

IEEE Internet Computing
Mining maximal hyperclique pattern: A hybrid search strategy

Information Sciences: an International Journal
Threshold-based declustering

Information Sciences: an International Journal
Reversible steganographic method using SMVQ approach based on declustering

Information Sciences: an International Journal
An efficient algorithm for mining frequent inter-transaction patterns

Information Sciences: an International Journal
Exploratory mining in cube space

Data Mining and Knowledge Discovery
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Discovery of maximum length frequent itemsets

Information Sciences: an International Journal
Analysis and Comparison of Replicated Declustering Schemes

IEEE Transactions on Parallel and Distributed Systems
On discovery of soft associations with "most" fuzzy quantifier for item promotion applications

Information Sciences: an International Journal
Multi-Site Retrieval of Declustered Data

ICDCS '08 Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems
Efficient single-pass frequent pattern mining using a prefix-tree

Information Sciences: an International Journal
Top-down mining of frequent closed patterns from very high dimensional data

Information Sciences: an International Journal
FIUT: A new method for mining frequent itemsets

Information Sciences: an International Journal
Sliding window-based frequent pattern mining over data streams

Information Sciences: an International Journal
An algorithm to mine general association rules from tabular data

Information Sciences: an International Journal
RMAIN: Association rules maintenance without reruns through data

Information Sciences: an International Journal
Association rule mining in peer-to-peer systems

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Parallel and distributed methods for incremental frequent itemset mining

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

High utility pattern mining using the maximal itemset property and lexicographic tree structures

Information Sciences: an International Journal
Mining numerical association rules via multi-objective genetic algorithms

Information Sciences: an International Journal
Core set analysis in inconsistent decision tables

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Existing parallel algorithms for association rule mining have a large inter-site communication cost or require a large amount of space to maintain the local support counts of a large number of candidate sets. This study proposes a de-clustering approach for distributed architectures, which eliminates the inter-site communication cost, for most of the influential association rule mining algorithms. To de-cluster the database into similar partitions, an efficient algorithm is developed to approximate the shortest spanning path (SSP) to link transaction data together. The SSP obtained is then used to evenly de-cluster the transaction data into subgroups. The proposed approach guarantees that all subgroups are similar to each other and to the original group. Experiment results show that data size and the number of items are the only two factors that determine the performance of de-clustering. Additionally, based on the approach, most of the influential association rule mining algorithms can be implemented in a distributed architecture to obtain a drastic increase in speed without losing any frequent itemsets. Furthermore, the data distribution in each de-clustered participant is almost the same as that of a single site, which implies that the proposed approach can be regarded as a sampling method for distributed association rule mining. Finally, the experiment results prove that the original inadequate mining results can be improved to an almost perfect level.