Tree-based partition querying: a methodology for computing medoids in large spatial datasets

Authors:
Kyriakos Mouratidis;Dimitris Papadias;Spiros Papadimitriou
Affiliations:
Singapore Management University, Singapore, Singapore;Hong Kong University of Science and Technology, Kowloon, Hong Kong;IBM T.J. Watson Research Center, Yorktown Heights, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2008

Citing 28
Cited 5

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Minkowski-type theorems and least-squares partitioning

SCG '92 Proceedings of the eighth annual symposium on Computational geometry
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximation schemes for Euclidean k-medians and related problems

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Clustering Algorithms

Clustering Algorithms
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
The Design and Implementation of Seeded Trees: An Efficient Method for Spatial Joins

IEEE Transactions on Knowledge and Data Engineering
Efficient Cost Models for Spatial Queries Using R-Trees

IEEE Transactions on Knowledge and Data Engineering
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Slot Index Spatial Join

IEEE Transactions on Knowledge and Data Engineering
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Generating Seeded Trees from Data Sets

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
SINA: scalable incremental processing of continuous queries in spatio-temporal databases

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Range Aggregate Processing in Spatial Databases

IEEE Transactions on Knowledge and Data Engineering
Monitoring k-Nearest Neighbor Queries over Moving Objects

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Integrated coverage and connectivity configuration for energy conservation in sensor networks

ACM Transactions on Sensor Networks (TOSN)
Progressive computation of the min-dist optimal-location query

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

Using trees to depict a forest

Proceedings of the VLDB Endowment
Optimal matching between spatial datasets under capacity constraints

ACM Transactions on Database Systems (TODS)
Continuous medoid queries over moving objects

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
A computational geometry-based local search algorithm for planar location problems

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Fast k-clustering queries on embeddings of road networks

Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Besides traditional domains (e.g., resource allocation, data mining applications), algorithms for medoid computation and related problems will play an important role in numerous emerging fields, such as location based services and sensor networks. Since the k-medoid problem is NP-hard, all existing work deals with approximate solutions on relatively small datasets. This paper aims at efficient methods for very large spatial databases, motivated by: (1) the high and ever increasing availability of spatial data, and (2) the need for novel query types and improved services. The proposed solutions exploit the intrinsic grouping properties of a data partition index in order to read only a small part of the dataset. Compared to previous approaches, we achieve results of comparable or better quality at a small fraction of the CPU and I/O costs (seconds as opposed to hours, and tens of node accesses instead of thousands). In addition, we study medoid-aggregate queries, where k is not known in advance, but we are asked to compute a medoid set that leads to an average distance close to a user-specified value. Similarly, medoid-optimization queries aim at minimizing both the number of medoids k and the average distance. We also consider the max version for the aforementioned problems, where the goal is to minimize the maximum (instead of the average) distance between any object and its closest medoid. Finally, we investigate bichromatic and weighted medoid versions for all query types, as well as, maximum capacity and dynamic medoids.