ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data

Authors:
Sriram Lakshminarasimhan;John Jenkins;Isha Arkatkar;Zhenhuan Gong;Hemanth Kolla;Seung-Hoe Ku;Stephane Ethier;Jackie Chen;C. S. Chang;Scott Klasky;Robert Latham;Robert Ross;Nagiza F. Samatova
Affiliations:
North Carolina State University, NC and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, NC and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, NC and Oak Ridge National Laboratory, Oak Ridge, TN;North Carolina State University, NC;Sandia National Laboratory, Livermore, CA;New York University, New York, NY;Princeton Plasma Physics Laboratory, Princeton, NJ;Sandia National Laboratory, Livermore, CA;New York University, New York, NY;Oak Ridge National Laboratory, Oak Ridge, TN;Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL;North Carolina State University, NC and Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2011

Citing 15
Cited 3

An introduction to spatial database systems

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
Compressing Bitmap Indexes for Faster Search Operations

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Byte-aligned bitmap compression

DCC '95 Proceedings of the Conference on Data Compression
Compressing Bitmap Indices by Data Reorganization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
DEX: increasing the capability of scientific data analysis pipelines by using efficient bitmap indices to accelerate scientific visualization

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Bitmap Index Design Choices and Their Performance Implications

IDEAS '07 Proceedings of the 11th International Database Engineering and Applications Symposium
Compressing large boolean matrices using reordering techniques

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the performance of bitmap indices for high cardinality attributes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Breaking the memory wall in MonetDB

Communications of the ACM - Surviving the data deluge
RLH: Bitmap compression technique based on run-length and Huffman encoding

Information Systems
A demonstration of SciDB: a science-oriented DBMS

Proceedings of the VLDB Endowment
Overview of sciDB: large scale array storage, processing and analysis

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I

ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed---rather than original, full-size---data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatio-temporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries---a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first techology that enables query-driven analytics over the compressed spatio-temporal floating-point double-or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.