Deterministic annealing and robust scalable data mining for the data deluge

  • Authors:
  • Geoffrey C. Fox

  • Affiliations:
  • Indiana University, Bloomington, IN, USA

  • Venue:
  • Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe data analytics on large systems using a suite of robust parallel algorithms running on both clouds and HPC systems. We apply this to cases where the data is defined in a vector space and when only pairwise distances between points are defined. We introduce new O(N logN) algorithms for pairwise cases, where direct algorithms are O(N2) for N points. We show the value of visualization using dimension reduction for steering complex analytics and illustrate the value of deterministic annealing for relatively fast robust algorithms. We apply methods to metagenomics applications.