Medoid queries in large spatial databases

Authors:
Kyriakos Mouratidis;Dimitris Papadias;Spiros Papadimitriou
Affiliations:
Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong;Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong;Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Venue:
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Year:
2005

Citing 19
Cited 3

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Clustering Algorithms

Clustering Algorithms
The Design and Implementation of Seeded Trees: An Efficient Method for Spatial Joins

IEEE Transactions on Knowledge and Data Engineering
Efficient Cost Models for Spatial Queries Using R-Trees

IEEE Transactions on Knowledge and Data Engineering
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Slot Index Spatial Join

IEEE Transactions on Knowledge and Data Engineering
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Generating Seeded Trees from Data Sets

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases

Progressive computation of the min-dist optimal-location query

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Using trees to depict a forest

Proceedings of the VLDB Endowment
A branch and bound method for min-dist location selection queries

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands).