Data exploration of turbulence simulations using a database cluster

Authors:
Eric Perlman;Randal Burns;Yi Li;Charles Meneveau
Affiliations:
Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD;Johns Hopkins University, Baltimore, MD
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 18
Cited 11

Implications of hierarchical N-body methods for multiprocessor architectures

ACM Transactions on Computer Systems (TOCS)
Balancing processor loads and exploiting data locality in N-body simulations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Structured Adaptive Mesh Refinement (Samr) Grid Methods

Structured Adaptive Mesh Refinement (Samr) Grid Methods
Dynamic load balancing of SAMR applications on distributed systems

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Investigating the Limits of SOAP Performance for Scientific Computing

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Hierarchical Partitioning Techniques for Structured Adaptive Mesh Refinement (SAMR) Applications

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Differential Serialization for Optimized SOAP Performance

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
A Computational Database System for Generatinn Unstructured Hexahedral Meshes with Billions of Elements

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
Scalable Parallel Octree Meshing for TeraScale Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A scientific workflow approach to distributed geospatial data processing using web services

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Building web services for scientific grid applications

IBM Journal of Research and Development
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Hypergraph partitioning for automatic memory hierarchy management

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Organization of data in non-convex spatial domains

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Scientific data management at the Johns Hopkins institute for data intensive engineering and science

ACM SIGMOD Record
Finding haystacks with needles: ranked search for data using geospatial and temporal characteristics

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
An efficient multi-tier tablet server storage architecture

Proceedings of the 2nd ACM Symposium on Cloud Computing
I/O streaming evaluation of batch queries for data-intensive computational turbulence

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
An architecture for a data-intensive computer

Proceedings of the first international workshop on Network-aware data management
Data-intensive spatial filtering in large numerical simulation datasets

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Implementing an affordable high-performance computing for teaching-oriented computer science curriculum

ACM Transactions on Computing Education (TOCE)
The open connectome project data cluster: scalable analysis and vision for high-throughput neuroscience

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Inverted indices for particle tracking in petascale cosmological simulations

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Run-time creation of the turbulent channel flow database by an HPC simulation using MPI-DB

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a new environment for the exploration of turbulent flows that uses a cluster of databases to store complete histories of Direct Numerical Simulation (DNS) results. This allows for spatial and temporal exploration of high-resolution data that were traditionally too large to store and too computationally expensive to produce on demand. We perform analysis of these data directly on the databases nodes, which minimizes the volume of network traffic. The low network demands enable us to provide public access to this experimental platform and its datasets through Web services. This paper details the system design and implementation. Specifically, we focus on hierarchical spatial indexing, cache-sensitive spatial scheduling of batch workloads, localizing computation through data partitioning, and load balancing techniques that minimize data movement. We provide real examples of how scientists use the system to perform high-resolution turbulence research from standard desktop computing environments.