Approximated trial and error analysis in scientific databases

  • Authors:
  • Etzard Stolte;Gustavo Alonso

  • Affiliations:
  • Department of Computer Science, Institute for Information Systems, Swiss Federal Institute of Technology (ETH) ETH Zentrum, CH-8092 Zürich, Switzerland;Department of Computer Science, Institute for Information Systems, Swiss Federal Institute of Technology (ETH) ETH Zentrum, CH-8092 Zürich, Switzerland

  • Venue:
  • Information Systems - Special issue: Best papers from EDBT 2002
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Databases are nowadays one more building block in complex multi-tier architectures. In general, however, they are still designed and optimized with little regard for the applications that will run on top of them. This problem is particularly acute in scientific applications where the data is never used or viewed as it is but always processed either for visualization or analysis purposes. In such scenarios, the data is usually processed at the client and, hence, conventional server side optimizations are of limited help. In this paper we present a variety of techniques and a novel client/server architecture designed to optimize the client side processing of scientific data. The main building block in our approach is to store frequently accessed data as relatively small, wavelet-encoded segments. These segments can be processed at different resolutions, thereby enabling efficient processing of very large data volumes. Experimental results demonstrate that our approach significantly reduces overhead (I/O, transfer across network, decoding and analysis). Furthermore, it does not require changes to the analysis routines and provides all possible resolution ranges. In the paper we describe these ideas and how they have been implemented in HEDC (RHESSI Experimental Data Center), a multi-TByte data hub for RHESSI, the Reuven Ramaty High Energy Solar Spectroscopic Imager satellite of NASA.