FREERIDE-G: enabling distributed processing of large datasets

  • Authors:
  • Leonid Glimcher;Gagan Agrawal

  • Affiliations:
  • The Ohio State University, Columbus, OH, USA;The Ohio State University, Columbus, OH, USA

  • Venue:
  • DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have been developing a middleware which enables development, support, and deployment of services that can transparently access and process data from remote servers, are compatible with grid standards and frameworks, and yet are efficient and scalable. Our middleware is referred to as FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have integrated the middleware with the grid computing standards through the use of the Globus Toolkit, more specifically, MPICH-G2. Another possibility that our middleware needs to consider is that the available data may be spread across multiple clusters. Thus, we need to develop schedules for data movement and processing, which minimize the overheads and achieve load balancing. Since the datasets may be vertically partitioned, we also need to generate wrappers automatically to bridge format differences.