Distributed data clustering in multi-dimensional peer-to-peer networks

  • Authors:
  • Stefano Lodi;Gianluca Moro;Claudio Sartori

  • Affiliations:
  • University of Bologna, Viale Risorgimento, Bologna, Italy;Via Venezia, Cesena (FC), Italy;University of Bologna, Viale Risorgimento, Bologna, Italy

  • Venue:
  • ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several algorithms have been recently developed for distributed data clustering, which are applied when data cannot be concentrated on a single machine, for instance because of privacy reasons or due to network bandwidth limitations, or because of the huge amount of distributed data. Deployed and research Peer-to-Peer systems have proven to be able to manage very large databases made up by thousands of personal computers resulting in a concrete solutions for the forthcoming new distributed database systems to be used in large grid computing networks and in clustering database management systems. Current distributed data clustering algorithms cannot be applied to such kind of networks because they expect data be organized according to traditional distributed database management systems where the distribution of the relational schema is planned a-priori in the design phase. In this paper we describe methods to cluster distributed data across peer-to-peer networks without requiring any costly reorganization of data, which would be infeasible in such a large and dynamic overlay networks, and without reducing their performance in message routing and query processing. We compare the data clustering quality and efficiency of three multi-dimensional peer-to-peer systems according to two well-known clustering techniques.