Data intensive analysis on the gordon high performance data and compute system

Authors:
Robert S. Sinkovits;Pietro Cicotti;Shawn Strande;Mahidhar Tatineni;Paul Rodriguez;Nicole Wolter;Natasha Balac
Affiliations:
University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 6
Cited 0

YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Scientific Computing with MATLAB and Octave (Texts in Computational Science and Engineering)

Scientific Computing with MATLAB and Octave (Texts in Computational Science and Engineering)
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
DASH-IO: an empirical study of flash-based IO for HPC

Proceedings of the 2010 TeraGrid Conference
Accelerating data-intensive science with Gordon and Dash

Proceedings of the 2010 TeraGrid Conference
Subset removal on massive data with Dash

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Gordon data intensive computing system was designed to handle problems with large memory requirements that cannot easily be solved using standard workstations or distributed memory supercomputers. We describe the unique features of Gordon that make it ideally suited for data mining and knowledge discovery applications: memory aggregation using the vSMP software solution from ScaleMP, I/O nodes containing 4 TB of low-latency flash memory, and a high performance parallel file system with 4 PB capacity. We also demonstrate how a number of standard data mining tools (e.g. Matlab, WEKA, R) can be used effectively on Dash, an early prototype of the full Gordon system.