The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
A scalable content-addressable network
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Advances in Distributed and Parallel Knowledge Discovery
Advances in Distributed and Parallel Knowledge Discovery
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
Scalable density-based distributed clustering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
PENS: an algorithm for density-based clustering in peer-to-peer systems
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Distributed data mining and agents
Engineering Applications of Artificial Intelligence
TreeP: a self-reconfigurable topology for unstructured P2P systems
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
ASCCN: Arbitrary Shaped Clustering Method with Compatible Nucleoids
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
Many distributed data mining DDMtasks such as distributed association rules and distributed classification have been proposed and developed in the last few years. However, only a few research concerns distributed clustering for analysing large, heterogeneous and distributed datasets. This is especially true with distributed density-based clustering although the centralised versions of the technique have been widely used fin different real-world applications. In this paper, we present a new approach for distributed density-based clustering. Our approach is based on two main concepts: the extension of local models created by DBSCAN at each node of the system and the aggregation of these local models by using tree based topologies to construct global models. The preliminary evaluation shows that our approach is efficient and flexible and it is appropriate with high density datasets and a moderate difference in dataset distributions among the sites.