Information and Control
The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Decomposing graphs into regions of small diameter
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Handbook of discrete and computational geometry
Handbook of discrete and computational geometry
Handbook of discrete and computational geometry
Approximating matrix multiplication for pattern recognition tasks
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Fast hierarchical clustering and other applications of dynamic closest pairs
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Squarepants in a tree: sum of subtree clustering and hyperbolic pants decomposition
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Squarepants in a tree: Sum of subtree clustering and hyperbolic pants decomposition
ACM Transactions on Algorithms (TALG)
Algorithms and theory of computation handbook
The three steps of clustering in the post-genomic era: a synopsis
CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Hi-index | 0.00 |
One of the central problems in information retrieval, data mining, computational biology, statistical analysis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Often the data is noisy and even approximate classification is of extreme importance. The difficulty of such classification stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or “nearly quadratic” running time; i.e., O(dn2−α(d)) time where n is the number of data points, d is the dimension of the space and α(d) approaches 0 as d grows. In this paper, we show (for three fairly natural clustering rules) that computing an approximate solution can be done much more efficiently. More specifically, for agglomerative clustering (used, for example, in the Alta Vista™ search engine), for the clustering defined by sparse partitions, and for a clustering based on minimum spanning trees we derive randomized (1 + ε) approximation algorithms with running times Õ(d2 n2−γ) where γ 0 depends only on the approximation parameter ε and is independent of the dimension d.