Approximated trial and error analysis in scientific databases

Authors:
Etzard Stolte;Gustavo Alonso
Affiliations:
Department of Computer Science, Institute for Information Systems, Swiss Federal Institute of Technology (ETH) ETH Zentrum, CH-8092 Zürich, Switzerland;Department of Computer Science, Institute for Information Systems, Swiss Federal Institute of Technology (ETH) ETH Zentrum, CH-8092 Zürich, Switzerland
Venue:
Information Systems - Special issue: Best papers from EDBT 2002
Year:
2003

Citing 23
Cited 1

An overview of wavelet based multiresolution analyses

SIAM Review
Answering queries using views (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Pattern matching and pattern discovery in scientific, program, and document databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Database performance in the real world: TPC-D and SAP R/3

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Multimedia support for databases

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Answering recursive queries using views

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A survey of QoS architectures

Multimedia Systems
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Microsoft TerraServer: a spatial data warehouse

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Designing and mining multi-terabyte astronomy archives: the Sloan Digital Sky Survey

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Optimizing Queries with Materialized Views

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Scientific Databases - State of the Art and Future Directions

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases

The VLDB Journal — The International Journal on Very Large Data Bases
Sharing Experiences from Scientific Experiments

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Efficient exploration of large scientific databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Management and storage of in situ oceanographic data: An ECM-based approach

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Databases are nowadays one more building block in complex multi-tier architectures. In general, however, they are still designed and optimized with little regard for the applications that will run on top of them. This problem is particularly acute in scientific applications where the data is never used or viewed as it is but always processed either for visualization or analysis purposes. In such scenarios, the data is usually processed at the client and, hence, conventional server side optimizations are of limited help. In this paper we present a variety of techniques and a novel client/server architecture designed to optimize the client side processing of scientific data. The main building block in our approach is to store frequently accessed data as relatively small, wavelet-encoded segments. These segments can be processed at different resolutions, thereby enabling efficient processing of very large data volumes. Experimental results demonstrate that our approach significantly reduces overhead (I/O, transfer across network, decoding and analysis). Furthermore, it does not require changes to the analysis routines and provides all possible resolution ranges. In the paper we describe these ideas and how they have been implemented in HEDC (RHESSI Experimental Data Center), a multi-TByte data hub for RHESSI, the Reuven Ramaty High Energy Solar Spectroscopic Imager satellite of NASA.