Extensible Parallel Query Processing for Exploratory Geoscientific Data Mining

  • Authors:
  • Eddie C. Shek;Richard R. Muntz;Edmond Mesrobian

  • Affiliations:
  • Information Sciences Laboratory, HRL Laboratories, LLC., Malibu, CA 90265, USA. shek@hrl.com;Computer Science Department, University of California, Los Angeles, CA 90024, USA. muntz@cs.ucla.edu;Disney Online, Burbank, CA 91521, USA. edmond@online.disney.com

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Exploratory data mining and analysis requires a computing environment which provides facilities for the user-friendly expression and rapid execution of “scientific queries.” In this paper, we address research issues in the parallelization of scientific queries containing complex user-defined operations. In a parallel query execution environment, parallelizing a query execution plan involves determining how input data streams to evaluators implementing logical operations can be divided to be processed by clones of the same evaluator in parallel. We introduced the concept of “relevance window” that characterizes data lineage and data partitioning opportunities available for an user-defined evaluator. In addition, we developed a query parallelization framework by extending relational parallel query optimization algorithms to allow the parallelization characteristics of user-defined evaluators to guide the process of query parallelization in an extensible query processing environment. We demonstrated the utility of our system by performing experiments mining cyclonic activity, blocking events, and the upward wave-energy propagation features from several observational and model simulation datasets.