FREERIDE-G: enabling distributed processing of large datasets

Authors:
Leonid Glimcher;Gagan Agrawal
Affiliations:
The Ohio State University, Columbus, OH, USA;The Ohio State University, Columbus, OH, USA
Venue:
DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Year:
2008

Citing 13
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Interprocedural data flow based optimizations for distributed memory compilation

Software—Practice & Experience
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Data mining: concepts and techniques

Data mining: concepts and techniques
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

Data Mining and Knowledge Discovery
The knowledge grid

Communications of the ACM
What are Web services?

Communications of the ACM - E-services: a cornucopia of digital offerings ushers in the next Net-based evolution
Grid-Based Knowledge Discovery Services for High Throughput Informatics

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
FREERIDE-G: Supporting Applications that Mine Remote FREERIDE-G: Supporting Applications that Mine Remote

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
A Tool for Supporting Integration Across Multiple Flat File Datasets

BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
Enabling Information Integration and Workflows in a Grid Environment with Automatic Wrapper Generation

GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Globus toolkit version 4: software for service-oriented systems

NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Distributed data mining on grids: services, tools, and applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have been developing a middleware which enables development, support, and deployment of services that can transparently access and process data from remote servers, are compatible with grid standards and frameworks, and yet are efficient and scalable. Our middleware is referred to as FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have integrated the middleware with the grid computing standards through the use of the Globus Toolkit, more specifically, MPICH-G2. Another possibility that our middleware needs to consider is that the available data may be spread across multiple clusters. Thus, we need to develop schedules for data movement and processing, which minimize the overheads and achieve load balancing. Since the datasets may be vertically partitioned, we also need to generate wrappers automatically to bridge format differences.