The Johnson-Lindenstrauss Lemma and the sphericity of some graphs
Journal of Combinatorial Theory Series A
Using latent semantic analysis to improve access to textual information
CHI '88 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Algebraic optimization: the Fermat-Weber location problem
Mathematical Programming: Series A and B
Information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Constant interaction-time scatter/gather browsing of very large document collections
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Polynomial time approximation schemes for dense instances of NP-hard problems
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Approximation algorithms for geometric problems
Approximation algorithms for NP-hard problems
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximation schemes for Euclidean k-medians and related problems
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Handbook of discrete and computational geometry
Efficient algorithms for geometric optimization
ACM Computing Surveys (CSUR)
A constant-factor approximation algorithm for the k-median problem (extended abstract)
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Subquadratic approximation algorithms for clustering problems in high dimensional spaces
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Greedy strikes back: improved facility location algorithms
Journal of Algorithms
Clustering in large graphs and matrices
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximation algorithms for clustering
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Journal of Algorithms
Clustering for edge-cost minimization (extended abstract)
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
SIAM Journal on Computing
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
A Randomized Approximation Scheme for Metric MAX-CUT
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
A Sublinear Time Approximation Scheme for Clustering in Metric Spaces
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
On clusterings-good, bad and spectral
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The regularity lemma and approximation schemes for dense problems
FOCS '96 Proceedings of the 37th Annual Symposium on Foundations of Computer Science
Search and Classification of High Dimensional Data
APPROX '02 Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization
Approximation schemes for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering Large Graphs via the Singular Value Decomposition
Machine Learning
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Matrix approximation and projective clustering via volume sampling
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Kernel-Based Grouping of Histogram Data
ECML '07 Proceedings of the 18th European conference on Machine Learning
A Study on Community Formation in Collaborative Tagging Systems
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Optimizing the communication distance of an ad hoc wireless sensor networks by genetic algorithms
Artificial Intelligence Review
Geometric clustering to minimize the sum of cluster sizes
ESA'05 Proceedings of the 13th annual European conference on Algorithms
On the complexity of several haplotyping problems
WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
The effectiveness of lloyd-type methods for the k-means problem
Journal of the ACM (JACM)
Clustering via geometric median shift over Riemannian manifolds
Information Sciences: an International Journal
Hi-index | 0.00 |
The Johnson--Lindenstrauss lemma states that n points in ahigh-dimensional Hilbert space can be embedded with smalldistortion of the distances into an O(log n)dimensional space by applying a random linear transformation. Weshow that similar (though weaker) properties hold for certainrandom linear transformations over the Hamming cube. We use thesetransformations to solve NP-hard clustering problems in the cube aswell as in geometric settings.More specifically, we address thefollowing clustering problem. Given n points in a larger set(e.g., ℝd) endowed with a distance function (e.g.,L2 distance), we would like to partition the dataset into k disjoint clusters, each with a "cluster center,"so as to minimize the sum over all data points of the distancebetween the point and the center of the cluster containing thepoint. The problem is provably NP-hard in some high-dimensionalgeometric settings, even for k = 2. We give polynomial-timeapproximation schemes for this problem in several settings,including the binary cube {0,1}d with Hamming distance,and ℝd either with L1 distance,or with L2 distance, or with the square ofL2 distance. In all these settings, the bestprevious results were constant factor approximation guarantees.Wenote that our problem is similar in flavor to the k-medianproblem (and the related facility location problem), which has beenconsidered in graph-theoretic and fixed dimensional geometricsettings, where it becomes hard when k is part of the input.In contrast, we study the problem when k is fixed, but thedimension is part of the input.