The multidimensional database system RasDaMan
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Characteristics of Scientific Databases
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Scientific data repositories: designing for a moving target
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Kepler: An Extensible System for Design and Execution of Scientific Workflows
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Scientific data management in the coming decade
ACM SIGMOD Record
Communications of the ACM
Finding haystacks with needles: ranked search for data using geospatial and temporal characteristics
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
SciHadoop: array-based query processing in Hadoop
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SciQL: bridging the gap between science and relational DBMS
Proceedings of the 15th Symposium on International Database Engineering & Applications
NoDB: efficient query execution on raw data files
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data vaults: a symbiosis between database technology and scientific file repositories
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
Nowadays scientists receive increasingly large volumes of data daily. These volumes and accompanying metadata that describes them are collected in scientific file repositories. Today's scientists need a data management tool that makes these file repositories accessible and performs a number of exploration steps near-instantly. Current database technology, however, has a long data-to-insight time, and does not provide enough interactivity to shorten the exploration time. We envision that exploiting metadata helps solving these problems. To this end, we propose a novel query execution paradigm, in which we decompose the query execution into two stages. During the first stage, we process only metadata, whereas the rest of the data is processed during the second stage. So that, we can exploit metadata to boost interactivity and to ingest only required data per query transparently. Preliminary experiments show that up-front ingestion time is reduced by orders of magnitude, while query performance remains similar. Motivated by these results, we identify the challenges on the way from the new paradigm to efficient interactive data exploration.