Data intensive analysis on the gordon high performance data and compute system

  • Authors:
  • Robert S. Sinkovits;Pietro Cicotti;Shawn Strande;Mahidhar Tatineni;Paul Rodriguez;Nicole Wolter;Natasha Balac

  • Affiliations:
  • University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA

  • Venue:
  • Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Gordon data intensive computing system was designed to handle problems with large memory requirements that cannot easily be solved using standard workstations or distributed memory supercomputers. We describe the unique features of Gordon that make it ideally suited for data mining and knowledge discovery applications: memory aggregation using the vSMP software solution from ScaleMP, I/O nodes containing 4 TB of low-latency flash memory, and a high performance parallel file system with 4 PB capacity. We also demonstrate how a number of standard data mining tools (e.g. Matlab, WEKA, R) can be used effectively on Dash, an early prototype of the full Gordon system.