Parallel computing (2nd ed.): theory and practice
Parallel computing (2nd ed.): theory and practice
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Internetworking with TCP/IP: Principles, Protocols, and Architecture
Internetworking with TCP/IP: Principles, Protocols, and Architecture
Using MPI-2: Advanced Features of the Message Passing Interface
Using MPI-2: Advanced Features of the Message Passing Interface
A Fault Tolerant Infrastructure for Mobile Agen
CIMCA '06 Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce
High Performance Linux Clusters: With OSCAR, Rocks, openMosix, and MPI (Nutshell Handbooks)
High Performance Linux Clusters: With OSCAR, Rocks, openMosix, and MPI (Nutshell Handbooks)
A Robust Decentralized Job Scheduling Approach for Mobile Peers in Ad-hoc Grids
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Fault-Driven Re-Scheduling For Improving System-level Fault Resilience
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
A Framework for Proactive Fault Tolerance
ARES '08 Proceedings of the 2008 Third International Conference on Availability, Reliability and Security
Proactive Fault Tolerance Using Preemptive Migration
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Plan-based replication for fault-tolerant multi-agent systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multiagent technology for fault tolerance and flexible control
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Hi-index | 0.00 |
Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.