LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal broadcast and summation in the LogP model
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Legion vision of a worldwide virtual computer
Communications of the ACM
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Communicating across parallel message-passing environments
Journal of Systems Architecture: the EUROMICRO Journal - Special double issue: cluster computing
The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Wide-area implementation of the message passing interface
Parallel Computing - Special issue on applications
MPI-StarT: delivering network performance to numerical applications
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A grid-enabled MPI: message passing in heterogeneous distributed computing systems
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Scalable Networked Information Processing Environment (SNIPE)
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Efficient Collective Communication on Heterogeneous Networks of Workstations
ICPP '98 Proceedings of the 1998 International Conference on Parallel Processing
ECO: Efficient Collective Operations for Communication on Heterogeneous Networks
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Cross-Platform Analysis of Fast Messages for Myrinet
CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
Distributed Computing in a Heterogeneous Computing Environment
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPI_Connect Managing Heterogeneous MPI Applications Ineroperation and Process Control
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Heterogeneous MPI Application Interoperation and Process Management under PVMPI
Proceedings of the 4th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Fine-Grain Software Distributed Shared Memory on SMP Clusters
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Forecasting network performance to support dynamic scheduling using the network weather service
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Optimizing Parallel Applications for Wide-Area Clusters
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Wire-area parallel computing in Java
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Toward Formally-Based Design of Message Passing Programs
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
MPICH-GQ: quality-of-service for message passing programs
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Object-based collective communication in Java
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
Optimizing threaded MPI execution on SMP clusters
ICS '01 Proceedings of the 15th international conference on Supercomputing
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The distributed ASCI Supercomputer project
ACM SIGOPS Operating Systems Review
Challenge: integrating mobile wireless devices into the computational grid
Proceedings of the 8th annual international conference on Mobile computing and networking
Ibis: an efficient Java-based grid programming environment
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Message passing without send-receive
Future Generation Computer Systems - Parallel computing technologies (PaCT-2001)
TOPOMON: A Monitoring Tool for Grid Network Topology
ICCS '02 Proceedings of the International Conference on Computational Science-Part II
Fast Measurement of LogP Parameters for Message Passing Platforms
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Send-Recv Considered Harmful? Myths and Truths about Parallel Programming
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
The Influence of the Structure and Sizes of Jobs on the Performance of Co-allocation
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The Influence of Communication on the Performance of Co-allocation
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Local versus Global Schedulers with Processor Co-allocation in Multicluster Systems
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Implementing MPI-2 Extended Collective Operations
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPI Optimization for SMP Based Clusters Interconnected with SCI
Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Improved MPI All-to-all Communication on a Giganet SMP Cluster
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Programming environments for high-performance grid computing: the Albatross project
Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
SAT: a programming methodology with skeletons and collective operations
Patterns and skeletons for parallel and distributed computing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing data aggregation for cluster-based internet services
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Priorities among Multiple Queues for Processor Co-Allocation in Multicluster Systems
ANSS '03 Proceedings of the 36th annual symposium on Simulation
Adaptive Timeout Discovery Using the Network Weather Service
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Send-receive considered harmful: Myths and realities of message passing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the execution time of global communication operations
Proceedings of the 1st conference on Computing frontiers
Efficient Multiple Multicast on Heterogeneous Network of Workstations
The Journal of Supercomputing
Load-balancing scatter operations for grid computing
Parallel Computing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Method for MPI Broadcast in Computational Grids
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Performance Analysis of MPI Collective Operations
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
Broadcasting on networks of workstations
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Towards an Accurate Model for Collective Communications
International Journal of High Performance Computing Applications
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
An adaptive grid implementation of DNA sequence alignment
Future Generation Computer Systems
Performance Modeling and Tuning Strategies of Mixed Mode Collective Communications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Globus Striped GridFTP Framework and Server
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms
Journal of Parallel and Distributed Computing
Self-adapting numerical software (SANS) effort
IBM Journal of Research and Development
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 20th annual international conference on Supercomputing
Constructing large suffix trees on a computational grid
Journal of Parallel and Distributed Computing
Collective Operations for Wide-Area Message Passing Systems Using Adaptive Spanning Trees
GRID '05 Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing
Performance analysis of MPI collective operations
Cluster Computing
Scheduling Policies for Processor Coallocation in Multicluster Systems
IEEE Transactions on Parallel and Distributed Systems
A fast topology inference: a building block for network-aware parallel processing
Proceedings of the 16th international symposium on High performance distributed computing
MOB: zero-configuration high-throughput multicasting for grid applications
Proceedings of the 16th international symposium on High performance distributed computing
An efficient MPI_allgather for grids
Proceedings of the 16th international symposium on High performance distributed computing
MPI collective algorithm selection and quadtree encoding
Parallel Computing
Exploitation of a parallel clustering algorithm on commodity hardware with P2P-MPI
The Journal of Supercomputing
Parallel programming over ChinaGrid
International Journal of Web and Grid Services
Broadcasting algorithm of constant complexity for fully-switched clusters
SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
A framework for adaptive collective communications for heterogeneous hierarchical computing systems
Journal of Computer and System Sciences
A GCM-based runtime support for parallel grid applications
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Large-Scale Parallel Computing on Grids
Electronic Notes in Theoretical Computer Science (ENTCS)
MPIWiz: subgroup reproducible replay of mpi applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Bandwidth efficient all-to-all broadcast on switched clusters
International Journal of Parallel Programming
Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models
International Journal of High Performance Computing Applications
Efficient high performance collective communication for the cell blade
Proceedings of the 23rd international conference on Supercomputing
High performance wide-area overlay using deadlock-free routing
Proceedings of the 18th ACM international symposium on High performance distributed computing
Hierarchical Collectives in MPICH2
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Process Mapping for MPI Collective Communications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
MPI Applications on Grids: A Topology Aware Approach
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
An adaptive grid implementation of DNA sequence alignment
Future Generation Computer Systems
Satin: A high-level and efficient grid programming model
ACM Transactions on Programming Languages and Systems (TOPLAS)
An efficient collective communication method for grid scale networks
ICCS'03 Proceedings of the 2003 international conference on Computational science
Linear algebra computation benchmarks on a model grid platform
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
CoMPI: configuration of collective operations in LAM/MPI using the scheme programming language
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
A clustering model for multicast on hypercube network
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Dynamic Load-Balanced Multicast for Data-Intensive Applications on Clouds
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Network offloaded hierarchical collectives using ConnectX-2's CORE-Direct capabilities
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Locality and topology aware intra-node communication among multicore CPUs
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Scheduling heuristics for efficient broadcast operations on grid environments
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Application-oriented adaptive MPI_Bcast for grids
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The topology aware file distribution problem
COCOON'11 Proceedings of the 17th annual international conference on Computing and combinatorics
A Robust and Efficient Message Passing Library for Volunteer Computing Environments
Journal of Grid Computing
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
Making wide-area, multi-site MPI feasible using xen VM
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Analyzing fault aware collective performance in a process fault tolerant MPI
Parallel Computing
MPI collective algorithm selection and quadtree encoding
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Efficient allgather for regular SMP-Clusters
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
The new multidevice architecture of MetaMPICH in the context of other approaches to grid-enabled MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Scalable fault tolerant MPI: extending the recovery algorithm
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Dynamic interoperable message passing
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A peer-to-peer framework for robust execution of message passing parallel programs on grids
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An efficient collective communication method using a shortest path algorithm in a computational grid
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Dynamically adaptive binomial trees for broadcasting in heterogeneous networks of workstations
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Topology-Based hypercube structures for global communication in heterogeneous networks
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
An overview of CMPI: network performance aware MPI in the cloud
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Load balancing for hierarchical grid computing: a case study
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Improving multilevel approach for optimizing collective communications in computational grids
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Exploiting single-assignment properties to optimize message-passing programs by code transformations
IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
A two-phase scheduling algorithm for efficient collective communications of MPICH-G2
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Decision trees and MPI collective algorithm selection problem
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A first step towards automatically building network representations
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Fast and efficient total exchange on two clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Open issues in MPI implementation
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Optimization of collective communications in HeteroMPI
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
NUMA-aware shared-memory collective communication for MPI
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
The topology aware file distribution problem
Journal of Combinatorial Optimization
Hi-index | 0.00 |
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective operations, such as broadcast and reduce. We have developed MAGPIE, a library of collective communication operations optimized for wide area systems. MAGPIE's algorithms send the minimal amount of data over the slow wide area links, and only incur a single wide area latency. Using our system, existing MPI applications can be run unmodified on geographically distributed systems. On moderate cluster sizes, using a wide area latency of 10 milliseconds and a bandwidth of 1 MByte/s, MAGPIE executes operations up to 10 times faster than MPICH, a widely used MPI implementation; application kernels improve by up to a factor of 4. Due to the structure of our algorithms, MAGPIE's advantage increases for higher wide area latencies.