An Efficient Approximate Algorithm for the 1-Median Problem in Metric Spaces

Authors:
D. Cantone;G. Cincotti;A. Ferro;A. Pulvirenti
Affiliations:
-;-;-;-
Venue:
SIAM Journal on Optimization
Year:
2005

Citing 0
Cited 6

Antipole Tree Indexing to Support Range Search and K-Nearest Neighbor Search in Metric Spaces

IEEE Transactions on Knowledge and Data Engineering
Efficiently answering top-k typicality queries on large databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Top-k typicality queries and efficient query answering methods on large databases

The VLDB Journal — The International Journal on Very Large Data Bases
Distributed antipole clustering for efficient data search and management in Euclidean and metric spaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multiple-Winners randomized tournaments with consensus for optimization problems in generic metric spaces

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
On approximating metric 1-median in sublinear time

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a simple and natural linear randomized algorithm for the approximate 1-median selection problem in metric spaces. The 1-median of a finite subset S of a metric space is the element of S which minimizes the average distance from the remaining points in S. This problem is extremely important in most applications using clustering of metric spaces, but also in connection with several algorithms in bioinformatics. The only linear approximation algorithm for the 1-median problem, which provably works in any metric space without going through any Euclidean space, has been proposed by Indyk in [Proceedings of the 31st Annual ACM Symposium on Theory of Computing, Atlanta, 1999, pp. 428--432]. However, Indyk's algorithm, which is based on sufficiently large sampling, turns out not to be a practical solution. The same holds true even for its heuristic variants which use samplings of smaller size. The algorithm we propose has a simple and efficient implementation, which performs better than Indyk's algorithm in practice. On the other hand, while the performance of Indyk's algorithm is guaranteed by an approximation factor, in the case of our algorithm we are only able to produce experimental evidence of its precision. Extensive experimentation has been performed on both synthetic and real input datasets. Synthetic datasets were generated with uniform and skewed distributions, using various metrics. Real datasets have been extrapolated from real world official databases available on the web. Successful results of the proposed algorithm are reported for several applications in bioinformatics and various classes of approximate search queries.