The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Advances in Distributed and Parallel Knowledge Discovery
Advances in Distributed and Parallel Knowledge Discovery
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets
Distributed and Parallel Databases - Special issue: Parallel and distributed data mining
3D Shape Histograms for Similarity Search and Classification in Spatial Databases
SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Using sets of feature vectors for similarity search on voxelized CAD objects
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Scalable density-based distributed clustering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Density-based clustering of uncertain data
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Efficient and effective server-sided distributed clustering
Proceedings of the 14th ACM international conference on Information and knowledge management
Probabilistic nearest-neighbor query on uncertain objects
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Probabilistic similarity join on uncertain data
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A self-similarity approach to repairing large dropouts of streamed music
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Hi-index | 0.00 |
In many modern application ranges high-dimensional feature vectors are used to model complex real-world objects. Often these objects reside on different local sites. In this paper, we p resent a general approach for extracting knowledge out of distributed data sets without transmitting all data from the local clients to a server site. In order to keep the transmission cost low, we first determine suitable local feature vector approximations which are sent to the server. Thereby, we approximate each feature vector as precisely as possible with a specified number of bytes. In order to extract knowledge out of these approximations, we introduce a suitable distance function between the feature vector approximations. In a detailed experimental evaluation, we demonstrate the benefits of our new feature vector approximation technique for the important area of distributed clustering. Thereby, we show that the combination of standard clustering algorithms and our feature vector approximation technique outperform specialized approaches for distributed clustering when using high-dimensional feature vectors.