Optistore: an on-demand data processing middleware for very large scale interactive visualization

  • Authors:
  • Jason Leigh;Chong Zhang

  • Affiliations:
  • University of Illinois at Chicago;University of Illinois at Chicago

  • Venue:
  • Optistore: an on-demand data processing middleware for very large scale interactive visualization
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

OptiStore is an on-demand data processing middleware for extremely large scale interactive visualization applications. It was designed and implemented to help the visualization users to access large amount of data (from terabytes to petabytes) on remote locations, query them on the distributed servers, transfer them among high performance clusters that are interconnected with optical network, and transform them from one data model to another in near real-time. Compared with the predominant strategy by preprocessing data on data repository before visualization, OptiStore processes the data on-demand and interactively so as to minimize the need to manage extraneous pre-processed copies of the data that will become a major problem as scientists continue to amass vast amounts of data. Furthermore, OptiStore is an extensible middleware framework, into which more new data structures and filters can be integrated. In order to address the issues of scalability of data size, interactivity in data exploration and scientific computation processes, and flexibility of data filter deployment, I proposed the following techniques in this dissertation: load-balancing data partition and organization, multi-resolution analysis, view-dependent data selection, runtime data preprocessing and dedicated parallel data filtering. To achieve high overall utilization and reduce latency cost, we applied a load-balancing data partition and organization mechanism. To ensure the scalability with the data size, the multi-resolution analysis and visibility culling were applied for processing the necessary data in the view of the visualization application. To take advantage of the increasing network bandwidth, we decoupled the data filter from visualization applications and data repository servers by transferring the bulk of data through the high-speed network infrastructure so that the data users can explore more large datasets available on remote sites and deploy their own filters flexibly. To minimize the data access latency, a novel caching algorithm and a prediction model for prefetching and preprocessing were developed. Experiments were conducted to prove that the system is effective and efficient.