Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A comprehensive three-dimensional model of the cochlea
Journal of Computational Physics
Distributed/Heterogeneous Query Processing in Microsoft SQL Server
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Distributed Immersed Boundary Simulation in Titanium
SIAM Journal on Scientific Computing
Multicollective I/O: A technique for exploiting inter-file access patterns
ACM Transactions on Storage (TOS)
UDT: UDP-based data transfer for high-speed wide area networks
Computer Networks: The International Journal of Computer and Telecommunications Networking
Data exploration of turbulence simulations using a database cluster
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Array requirements for scientific applications and an implementation for microsoft SQL server
Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Run-time creation of the turbulent channel flow database by an HPC simulation using MPI-DB
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Scientific instruments, as well as simulations, generate increasingly large datasets, changing the way we do science. We propose a system that we call the data-intensive computer for computing with Petascale-sized datasets. The data-intensive computer consists of an HPC cluster, a massively parallel database and a set of computing servers running the data-intensive operating system, which turns the database into a layer in the memory hierarchy of the data-intensive computer. The data-intensive operating system is data-object-oriented: the abstract programming model of a sequential file, central to traditional computer operating systems, is replaced with system-level support for high-level data objects, such as multi-dimensional arrays, graphs, sparse arrays, etc. User application programs will be compiled into code that is executed both on the HPC cluster and inside the database. The data-intensive operating system is however non-local, allowing remote applications to execute code inside the database. This model supports the collaborative environment, where a large data set is typically created and processed by a large group of users. We are developing a software library, MPI-DB, which is a prototype of the data-intensive operating system. It is currently being used by the Turbulence group at JHU to store simulation output in the database and to perform simulations refining previously stored results.