CLAP: Collaborative pattern mining for distributed information systems

Authors:
Xingquan Zhu;Bin Li;Xindong Wu;Dan He;Chengqi Zhang
Affiliations:
QCIS Centre, Faculty of Eng. & Info. Technology, Univ. of Technology, Sydney, Ultimo 2007, Australia and Dept. of Computer Science & Eng., Florida Atlantic University, Boca Raton, FL 33431, USA;QCIS Centre, Faculty of Eng. & Info. Technology, Univ. of Technology, Sydney, Ultimo 2007, Australia;Dept. of Computer Science, University of Vermont, Burlington VT 05404, USA and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, China;Dept. of Computer Science, Univ. of California at Los Angeles, Los Angeles, CA, 90095, USA;QCIS Centre, Faculty of Eng. & Info. Technology, Univ. of Technology, Sydney, Ultimo 2007, Australia
Venue:
Decision Support Systems
Year:
2011

Citing 45
Cited 3

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Scalable parallel data mining for association rules

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Parallel and Distributed Association Mining: A Survey

IEEE Concurrency
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Synthesizing High-Frequency Rules from Different Data Sources

IEEE Transactions on Knowledge and Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ReCoM: reinforcement clustering of multi-type interrelated data objects

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Spectral bloom filters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mining Frequent Itemsets in Distributed and Dynamic Databases

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Communication-Efficient Distributed Mining of Association Rules

Data Mining and Knowledge Discovery
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
The Bloomier filter: an efficient data structure for static support lookup tables

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Knowledge Discovery in Multiple Databases

Knowledge Discovery in Multiple Databases
Clustering Aggregation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Finding (Recently) Frequent Items in Distributed Data Streams

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Bloom Filter-Based XML Packets Filtering for Millions of Path Queries

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A distributed learning framework for heterogeneous data sources

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Prediction cubes

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Distributed higher order association rule mining using information extracted from textual data

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Mining Minimal Distinguishing Subsequence Patterns with Gap Constraints

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Mining Multiple Data Sources: Local Pattern Analysis

Data Mining and Knowledge Discovery
Toward terabyte pattern mining: an architecture-conscious solution

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed classification in peer-to-peer networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Association-based similarity testing and its applications

Intelligent Data Analysis
Preserving privacy in association rule mining with bloom filters

Journal of Intelligent Information Systems
ODAM: An Optimized Distributed Association Rule Mining Algorithm

IEEE Distributed Systems Online
Conceptual equivalence for contrast mining in classification learning

Data & Knowledge Engineering
MMIS07, 08: mining multiple information sources workshop report

ACM SIGKDD Explorations Newsletter
Distributed data mining: why do more than aggregating models

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Multiple information sources cooperative learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Robust ensemble learning for mining noisy data streams

Decision Support Systems
Data mining for credit card fraud: A comparative study

Decision Support Systems
Mining comparative opinions from customer reviews for Competitive Intelligence

Decision Support Systems
Distributed pattern discovery in multiple streams

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Distributed classification of Gaussian space-time sources in wireless sensor networks

IEEE Journal on Selected Areas in Communications

Distributed customer behavior prediction using multiplex data: A collaborative MK-SVM approach

Knowledge-Based Systems
Mining stable patterns in multiple correlated databases

Decision Support Systems
Quality of information-based source assessment and selection

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of data mining from distributed information systems is usually threefold: (1) identifying locally significant patterns in individual databases; (2) discovering emerging significant patterns after unifying distributed databases in a single view; and (3) finding patterns which follow special relationships across different data collections. While existing research has significantly advanced the techniques for mining local and global patterns (the first two goals), very little attempt has been made to discover patterns across distributed databases (the third goal). Moreover, no framework currently exists to support the mining of all three types of patterns. This paper proposes solutions to discover patterns from distributed databases. More specifically, we consider pattern mining as a query process where the purpose is to discover patterns from distributed databases with patterns' relationships satisfying user specified query constraints. We argue that existing self-contained mining frameworks are neither efficient, nor feasible to fulfill the objective, mainly because their pattern pruning is single-database oriented. To solve the problem, we advocate a cross-database pruning concept and propose a collaborative pattern (CLAP) mining framework with cross-database pruning mechanisms for distributed pattern mining. In CLAP, distributed databases collaboratively exchange pattern information between sites so that each site can leverage information from other sites to gain cross-database pruning. Experimental results show that CLAP fits a niche position, and demonstrate that CLAP not only outperforms its other peers with significant runtime performance gains, but also helps find patterns incapable of being discovered by others.