Approximated clustering of distributed high-dimensional data

Authors:
Hans-Peter Kriegel;Peter Kunath;Martin Pfeifle;Matthias Renz
Affiliations:
University of Munich, Germany;University of Munich, Germany;University of Munich, Germany;University of Munich, Germany
Venue:
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2005

Citing 10
Cited 6

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Advances in Distributed and Parallel Knowledge Discovery

Advances in Distributed and Parallel Knowledge Discovery
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets

Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
3D Shape Histograms for Similarity Search and Classification in Spatial Databases

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Using sets of feature vectors for similarity search on voxelized CAD objects

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases

Density-based clustering of uncertain data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Efficient and effective server-sided distributed clustering

Proceedings of the 14th ACM international conference on Information and knowledge management
Probabilistic nearest-neighbor query on uncertain objects

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Probabilistic similarity join on uncertain data

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A self-similarity approach to repairing large dropouts of streamed music

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
GoSCAN: Decentralized scalable data clustering

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many modern application ranges high-dimensional feature vectors are used to model complex real-world objects. Often these objects reside on different local sites. In this paper, we p resent a general approach for extracting knowledge out of distributed data sets without transmitting all data from the local clients to a server site. In order to keep the transmission cost low, we first determine suitable local feature vector approximations which are sent to the server. Thereby, we approximate each feature vector as precisely as possible with a specified number of bytes. In order to extract knowledge out of these approximations, we introduce a suitable distance function between the feature vector approximations. In a detailed experimental evaluation, we demonstrate the benefits of our new feature vector approximation technique for the important area of distributed clustering. Thereby, we show that the combination of standard clustering algorithms and our feature vector approximation technique outperform specialized approaches for distributed clustering when using high-dimensional feature vectors.