Turning scientists into data explorers

Authors:
Yağız Kargın
Affiliations:
CWI & University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Year:
2013

Citing 11
Cited 0

The multidimensional database system RasDaMan

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Characteristics of Scientific Databases

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Scientific data repositories: designing for a moving target

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Kepler: An Extensible System for Design and Execution of Scientific Workflows

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Scientific data management in the coming decade

ACM SIGMOD Record
Managing scientific data

Communications of the ACM
Finding haystacks with needles: ranked search for data using geospatial and temporal characteristics

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
SciHadoop: array-based query processing in Hadoop

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
SciQL: bridging the gap between science and relational DBMS

Proceedings of the 15th Symposium on International Database Engineering & Applications
NoDB: efficient query execution on raw data files

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data vaults: a symbiosis between database technology and scientific file repositories

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays scientists receive increasingly large volumes of data daily. These volumes and accompanying metadata that describes them are collected in scientific file repositories. Today's scientists need a data management tool that makes these file repositories accessible and performs a number of exploration steps near-instantly. Current database technology, however, has a long data-to-insight time, and does not provide enough interactivity to shorten the exploration time. We envision that exploiting metadata helps solving these problems. To this end, we propose a novel query execution paradigm, in which we decompose the query execution into two stages. During the first stage, we process only metadata, whereas the rest of the data is processed during the second stage. So that, we can exploit metadata to boost interactivity and to ingest only required data per query transparently. Preliminary experiments show that up-front ingestion time is reduced by orders of magnitude, while query performance remains similar. Motivated by these results, we identify the challenges on the way from the new paradigm to efficient interactive data exploration.