An introduction to spatial database systems
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Compressing Bitmap Indexes for Faster Search Operations
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Byte-aligned bitmap compression
DCC '95 Proceedings of the Conference on Data Compression
Compressing Bitmap Indices by Data Reorganization
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Bitmap Index Design Choices and Their Performance Implications
IDEAS '07 Proceedings of the 11th International Database Engineering and Applications Symposium
Compressing large boolean matrices using reordering techniques
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the performance of bitmap indices for high cardinality attributes
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Column-stores vs. row-stores: how different are they really?
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Breaking the memory wall in MonetDB
Communications of the ACM - Surviving the data deluge
RLH: Bitmap compression technique based on run-length and Huffman encoding
Information Systems
A demonstration of SciDB: a science-oriented DBMS
Proceedings of the VLDB Endowment
Overview of sciDB: large scale array storage, processing and analysis
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Efficient analytics of scientific data from extreme-scale simulations is quickly becoming a top-notch priority. The increasing simulation output data sizes demand for a paradigm shift in how analytics is conducted. In this paper, we argue that query-driven analytics over compressed---rather than original, full-size---data is a promising strategy in order to meet storage-and-I/O-bound application challenges. As a proof-of-principle, we propose a parallel query processing engine, called ISABELA-QA that is designed and optimized for knowledge priors driven analytical processing of spatio-temporal, multivariate scientific data that is initially compressed, in situ, by our ISABELA technology. With ISABELA-QA, the total data storage requirement is less than 23%-30% of the original data, which is upto eight-fold less than what the existing state-of-the-art data management technologies that require storing both the original data and the index could offer. Since ISABELA-QA operates on the metadata generated by our compression technology, its underlying indexing technology for efficient query processing is light-weight; it requires less than 3% of the original data, unlike existing database indexing approaches that require 30%-300% of the original data. Moreover, ISABELA-QA is specifically optimized to retrieve the actual values rather than spatial regions for the variables that satisfy user-specified range queries---a functionality that is critical for high-accuracy data analytics. To the best of our knowledge, this is the first techology that enables query-driven analytics over the compressed spatio-temporal floating-point double-or single-precision data, while offering a light-weight memory and disk storage footprint solution with parallel, scalable, multi-node, multi-core, GPU-based query processing.