Asynchronous peer-to-peer data mining with stochastic gradient descent
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Text clustering for peer-to-peer networks with probabilistic guarantees
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Distributed troubleshooting of web sessions using clustering
TMA'12 Proceedings of the 4th international conference on Traffic Monitoring and Analysis
ACM Transactions on Knowledge Discovery from Data (TKDD)
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks
Journal of Parallel and Distributed Computing
Future Generation Computer Systems
Locating communities on graphs with variations in community sizes
The Journal of Supercomputing
Evolutionary k-means for distributed data sets
Neurocomputing
Achieving Energy Conservation by Cluster Based Data Aggregation in Wireless Sensor Networks
Wireless Personal Communications: An International Journal
Hi-index | 0.00 |
Data intensive Peer-to-Peer (P2P) networks are finding increasing number of applications. Data mining in such P2P environments is a natural extension. However, common monolithic data mining architectures do not fit well in such environments since they typically require centralizing the distributed data which is usually not practical in a large P2P network. Distributed data mining algorithms that avoid large-scale synchronization or data centralization offer an alternate choice. This paper considers the distributed K-means clustering problem where the data and computing resources are distributed over a large P2P network. It offers two algorithms which produce an approximation of the result produced by the standard centralized K-means clustering algorithm. The first is designed to operate in a dynamic P2P network that can produce clusterings by “local” synchronization only. The second algorithm uses uniformly sampled peers and provides analytical guarantees regarding the accuracy of clustering on a P2P network. Empirical results show that both the algorithms demonstrate good performance compared to their centralized counterparts at the modest communication cost.