Efficient exploration of large scientific databases

  • Authors:
  • Etzard Stolte;Gustavo Alonso

  • Affiliations:
  • Department of Computer Science, Swiss Federal Institute of Technology (ETHZ), ETH Zentrum, Zürich, Switzerland;Department of Computer Science, Swiss Federal Institute of Technology (ETHZ), ETH Zentrum, Zürich, Switzerland

  • Venue:
  • VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

One of the challenging aspects of scientific data repositories is how to efficiently explore the catalogues that describe the data. We have encountered such a problem while developing HEDC, HESSI Experimental data center, a multi-terabyte repository built for the recently launched HESSI satellite. In HEDC, scientific users will soon be confronted with a catalogue of many million tuples. In this paper we present a novel technique that allows users to efficiently explore such a large data space in an interactive manner. Our approach is to store a copy of relevant fields in segmented and wavelet encoded views that are streamed to specialized clients. These clients use approximated data and adaptive decoding techniques to allow users to quickly visualize the search space. In the paper we describe how this approach reduces from hours to seconds the time needed to generate meaningful visualizations of millions of tuples.