Efficient exploration of large scientific databases

Authors:
Etzard Stolte;Gustavo Alonso
Affiliations:
Department of Computer Science, Swiss Federal Institute of Technology (ETHZ), ETH Zentrum, Zürich, Switzerland;Department of Computer Science, Swiss Federal Institute of Technology (ETHZ), ETH Zentrum, Zürich, Switzerland
Venue:
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Year:
2002

Citing 22
Cited 6

Answering queries using views (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
DEVise: integrated querying and visual exploration of large datasets

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Complexity of answering queries using materialized views

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
databases and visualization

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Rewriting aggregate queries using views

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases

IEEE Transactions on Visualization and Computer Graphics
VisDB: Database Exploration Using Multidimensional Visualization

IEEE Computer Graphics and Applications
Optimizing Scientific Databases for Client Side Data Processing

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Optimizing Queries with Materialized Views

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Comparison of Remote Visualization Strategies for Interactive Exploration of Large Data Sets

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases

The VLDB Journal — The International Journal on Very Large Data Bases

Approximated trial and error analysis in scientific databases

Information Systems - Special issue: Best papers from EDBT 2002
Just-in-time aspects: efficient dynamic weaving for Java

Proceedings of the 2nd international conference on Aspect-oriented software development
Scientific data repositories: designing for a moving target

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Algebraic manipulation of scientific datasets

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient lineage tracking for scientific workflows

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Finding haystacks with needles: ranked search for data using geospatial and temporal characteristics

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the challenging aspects of scientific data repositories is how to efficiently explore the catalogues that describe the data. We have encountered such a problem while developing HEDC, HESSI Experimental data center, a multi-terabyte repository built for the recently launched HESSI satellite. In HEDC, scientific users will soon be confronted with a catalogue of many million tuples. In this paper we present a novel technique that allows users to efficiently explore such a large data space in an interactive manner. Our approach is to store a copy of relevant fields in segmented and wavelet encoded views that are streamed to specialized clients. These clients use approximated data and adaptive decoding techniques to allow users to quickly visualize the search space. In the paper we describe how this approach reduces from hours to seconds the time needed to generate meaningful visualizations of millions of tuples.