A hybrid approach for multiresolution modeling of large-scale scientific data

Authors:
Tina Eliassi-Rad;Terence Critchlow
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 12
Cited 0

The Aqua approximate query answering system

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Squashing flat files flatter

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive, multiresolution visualization of large data sets using a distributed memory octree

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Practical lessons in supporting large-scale computational science

ACM SIGMOD Record
Approximate ad-hoc query engine for simulation data

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Principles of data mining

Principles of data mining
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Multi-resolution modeling of large scale scientific simulation data

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Simulation data as data streams

ACM SIGMOD Record
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The evolution of a hierarchical partitioning algorithm for large-scale scientific data: three steps of increasing complexity

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Simulations of complex scientific phenomena involve the execution of massively parallel computer programs. These simulation programs generate large-scale multidimensional data sets over the spatio-temporal region. Analyzing such massive data sets is an essential step in helping scientists glean new information. To this end, efficient and effective data models are needed. In this paper, we present a hybrid approach for constructing data models from large-scale multidimensional scientific data sets. Our models not only provide descriptive information about the data but also allow users to subsequently examine the data by querying the data models. Our approach combines a multiresolution-topological model of the data with a multivariate-physical model of the data to generate one hierarchical data model that efficiently captures both the spatio-temporal and the physical aspects of the data. In particular, this hybrid approach consists of three phases. In the first phase, we build a multiresolution model that encapsulates the data set's spatial information (i.e., topology and spatial connectivity). In the second phase, we build a multivariate model from the physical dimensions of the data set. Physical dimensions refer to those dimensions that are neither spatial (x, y, z) nor temporal (time). The exclusion of the spatial-temporal dimensions from the clustering phase is important since "similar" characteristics could be located (spatially) far from each other. Finally, in the third phase, we connect the multivariate-physical model to the multiresolution-topological model by utilizing ideas from information retrieval. The third phase is essential since the multivariate-physical model does not contain any topological information (without which the model does not have accurate spatial context information). Experimental evaluations on two large-scale multidimensional scientific data sets illustrate the value of our hybrid approach.