The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
CIKM '93 Proceedings of the second international conference on Information and knowledge management
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximation schemes for Euclidean k-medians and related problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
Clustering Algorithms
The Design and Implementation of Seeded Trees: An Efficient Method for Spatial Joins
IEEE Transactions on Knowledge and Data Engineering
Efficient Cost Models for Spatial Queries Using R-Trees
IEEE Transactions on Knowledge and Data Engineering
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Generating Seeded Trees from Data Sets
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Progressive computation of the min-dist optimal-location query
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Using trees to depict a forest
Proceedings of the VLDB Endowment
A branch and bound method for min-dist location selection queries
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Hi-index | 0.00 |
Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU and I/O cost. In particular, we exploit the intrinsic grouping properties of the index in order to avoid reading the entire dataset. Furthermore, we apply our framework to solve medoid-aggregate queries, where k is not known in advance; instead, we are asked to compute a medoid set that leads to an average distance close to a user-specified parameter T. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands).