Pairwise Data Clustering by Deterministic Annealing
IEEE Transactions on Pattern Analysis and Machine Intelligence
Dimension reduction and visualization of large high-dimensional data via interpolation
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
We describe data analytics on large systems using a suite of robust parallel algorithms running on both clouds and HPC systems. We apply this to cases where the data is defined in a vector space and when only pairwise distances between points are defined. We introduce new O(N logN) algorithms for pairwise cases, where direct algorithms are O(N2) for N points. We show the value of visualization using dimension reduction for steering complex analytics and illustrate the value of deterministic annealing for relatively fast robust algorithms. We apply methods to metagenomics applications.