Algorithms for clustering data
Algorithms for clustering data
Interprocedural data flow based optimizations for distributed memory compilation
Software—Practice & Experience
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
Data Mining and Knowledge Discovery
Communications of the ACM
Communications of the ACM - E-services: a cornucopia of digital offerings ushers in the next Net-based evolution
Grid-Based Knowledge Discovery Services for High Throughput Informatics
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
A Tool for Supporting Integration Across Multiple Flat File Datasets
BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Globus toolkit version 4: software for service-oriented systems
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Distributed data mining on grids: services, tools, and applications
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
We have been developing a middleware which enables development, support, and deployment of services that can transparently access and process data from remote servers, are compatible with grid standards and frameworks, and yet are efficient and scalable. Our middleware is referred to as FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid). We have integrated the middleware with the grid computing standards through the use of the Globus Toolkit, more specifically, MPICH-G2. Another possibility that our middleware needs to consider is that the available data may be spread across multiple clusters. Thus, we need to develop schedules for data movement and processing, which minimize the overheads and achieve load balancing. Since the datasets may be vertically partitioned, we also need to generate wrappers automatically to bridge format differences.