Antipole Tree Indexing to Support Range Search and K-Nearest Neighbor Search in Metric Spaces
IEEE Transactions on Knowledge and Data Engineering
Efficiently answering top-k typicality queries on large databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Top-k typicality queries and efficient query answering methods on large databases
The VLDB Journal — The International Journal on Very Large Data Bases
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
On approximating metric 1-median in sublinear time
Information Processing Letters
Hi-index | 0.00 |
We propose a simple and natural linear randomized algorithm for the approximate 1-median selection problem in metric spaces. The 1-median of a finite subset S of a metric space is the element of S which minimizes the average distance from the remaining points in S. This problem is extremely important in most applications using clustering of metric spaces, but also in connection with several algorithms in bioinformatics. The only linear approximation algorithm for the 1-median problem, which provably works in any metric space without going through any Euclidean space, has been proposed by Indyk in [Proceedings of the 31st Annual ACM Symposium on Theory of Computing, Atlanta, 1999, pp. 428--432]. However, Indyk's algorithm, which is based on sufficiently large sampling, turns out not to be a practical solution. The same holds true even for its heuristic variants which use samplings of smaller size. The algorithm we propose has a simple and efficient implementation, which performs better than Indyk's algorithm in practice. On the other hand, while the performance of Indyk's algorithm is guaranteed by an approximation factor, in the case of our algorithm we are only able to produce experimental evidence of its precision. Extensive experimentation has been performed on both synthetic and real input datasets. Synthetic datasets were generated with uniform and skewed distributions, using various metrics. Real datasets have been extrapolated from real world official databases available on the web. Successful results of the proposed algorithm are reported for several applications in bioinformatics and various classes of approximate search queries.