Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Communication-efficient distributed mining of association rules
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
Scalable Parallel Data Mining for Association Rules
IEEE Transactions on Knowledge and Data Engineering
Incremental Mining of Constrained Associations
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Mining Association Rules: Anti-Skew Algorithms
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
k-TTP: a new privacy model for large-scale distributed environments
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributed approximate mining of frequent patterns
Proceedings of the 2005 ACM symposium on Applied computing
Veracity radius: capturing the locality of distributed computations
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Want scalable computing?: speculate!
ACM SIGACT News
Client-side web mining for community formation in peer-to-peer environments
ACM SIGKDD Explorations Newsletter
Learning quantifiable associations via principal sparse non-negative matrix factorization
Intelligent Data Analysis
Distributed feature extraction in a p2p setting: a case study
Future Generation Computer Systems - Special section: Data mining in grid computing environments
Approximate mining of frequent patterns on streams
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Efficient algorithms for incremental Web log mining with dynamic thresholds
The VLDB Journal — The International Journal on Very Large Data Bases
Performance study of distributed Apriori-like frequent itemsets mining
Knowledge and Information Systems
Mining quantitative associations in large database
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
A scalable distributed stream mining system for highway traffic data
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A local facility location algorithm for sensor networks
DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Hi-index | 0.00 |
We extend the problem of association rule mining -a key data mining problem - to systems in which thedatabase is partitioned among a very large number ofcomputers that are dispersed over a wide area. Such computing systems include GRID computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses several difficulties, such as the impracticality of global communications and global synchronization, dynamic topology changes ofthe network, on-the-fly data updates, the need to share resources with other applications, and the frequent failureand recovery of resources.We present an algorithm by which every node in thesystem can reach the exact solution, as if it were giventhe combined database. The algorithm is entirely asynchronous, imposes very little communication overhead,transparently tolerates network topology changes andnode failures, and quickly adjusts to changes in the dataas they occur. Simulation of up to 10,000 nodes show thatthe algorithm is local: all rules, except for those whoseconfidence is about equal to the confidence threshold, arediscovered using information gathered from a very smallvicinity, whose size is independent of the size of the system.