The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
An architecture of a dataflow single chip processor
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The importance of non-data touching processing overheads in TCP/IP
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Message passing on the Meiko CS-2
Parallel Computing - Special issue: message passing interfaces
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Towards modeling the performance of a fast connected components algorithm on parallel machines
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fast Parallel Sorting Under LogP: Experience with the CM-5
IEEE Transactions on Parallel and Distributed Systems
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
TNet: A Reliable System Area Network
IEEE Micro
A Case for NOW (Networks of Workstations)
IEEE Micro
Memory Channel Network for PCI
IEEE Micro
Assessing Fast Network Interfaces
IEEE Micro
Protocol Verification as a Hardware Design Aid
ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
START-NG: Delivering Seamless Parallel Computing
Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Modeling communication pipeline latency
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Techniques for energy minimization of communication pipelines
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Improving I/O performance with a conditional store buffer
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Design challenges of virtual networks: fast, general-purpose communication
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive analysis of a wavefront application using LogGP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating synchronization on shared address space multiprocessors: methodology and performance
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
NFS sensitivity to high performance networks
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications
IEEE Transactions on Parallel and Distributed Systems
Responsiveness without interrupts
ICS '99 Proceedings of the 13th international conference on Supercomputing
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Quality of service for wide area clusters
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating design alternatives for reliable communication on high-speed networks
ACM SIGPLAN Notices
Parallelizing the Murϕ Verifier
Formal Methods in System Design - Special issue on CAV '97
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
ESP: a language for programmable devices
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Evaluating design alternatives for reliable communication on high-speed networks
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Efficiency vs. portability in cluster-based network servers
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing
IEEE Transactions on Parallel and Distributed Systems
EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Hardware-Assisted Characterization of NAS Benchmarks
Cluster Computing
Evolving RPC for active storage
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Virtual Interface Architecture
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
On the Design of Clustering-based Scheduling Algorithms for Realistic Machine Models
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Performance Prediction Methodology for Parallel Programs with MPI in NOW Environments
IWDC '02 Proceedings of the 4th International Workshop on Distributed Computing, Mobile and Wireless Computing
VIA Communication Performance on a Gigabit Ethernet Cluster
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
On Minimising the Processor Requirements of LogP Schedules
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
QUIC: A Quality of Service Network Interface Layer for Communication in NOWs
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Scalability and accuracy in a large-scale network emulator
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Optimizing Parallel Applications for Wide-Area Clusters
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
Performance Analysis of a Myrinet-Based Cluster
Cluster Computing
Exploiting fast ethernet performance in multiplatform cluster environment
Proceedings of the 2004 ACM symposium on Applied computing
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PRESS: A Clustered Server Based on User-Level Communication
IEEE Transactions on Parallel and Distributed Systems
A Hardware Acceleration Unit for MPI Queue Processing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Enhancing NIC Performance for MPI using Processing-in-Memory
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Scalability and accuracy in a large-scale network emulator
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Journal of Parallel and Distributed Computing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
International Journal of High Performance Computing Applications
Deconstructing Commodity Storage Clusters
Proceedings of the 32nd annual international symposium on Computer Architecture
Making the Most Out of Direct-Access Network Attached Storage
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Modelling asynchronous message passing in small cluster environments
International Journal of Computers and Applications
Temporal search: detecting hidden malware timebombs with virtual machines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Coprocessor design to support MPI primitives in configurable multiprocessors
Integration, the VLSI Journal
U-Net/SLE: A Java-based user-customizable virtual network interface
Scientific Programming
Productivity prediction of MPI programs based on models
Automation and Remote Control
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Broadcasting algorithm of constant complexity for fully-switched clusters
SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Overcoming the processor communication overhead in MPI applications
SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
A session key caching and prefetching scheme for secure communication in cluster systems
Journal of Parallel and Distributed Computing
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Motivating future interconnects: a differential measurement analysis of PCI latency
Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A preliminary analysis of the infinipath and XD1 network interfaces
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
International Journal of High Performance Computing Applications
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
Making the most out of direct-access network attached storage
FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
FC-TRSN: a new cluster-oriented high-speed communication network
ICCOM'06 Proceedings of the 10th WSEAS international conference on Communications
Using link gradients to predict the impact of network latency on multitier applications
IEEE/ACM Transactions on Networking (TON)
Estimation based load balancing algorithm for data-intensive heterogeneous grid environments
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Challenges and issues in benchmarking MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Prediction of communication latency over complex network behaviors on SMP clusters
EPEW'05/WS-FM'05 Proceedings of the 2005 international conference on European Performance Engineering, and Web Services and Formal Methods, international conference on Formal Techniques for Computer Systems and Business Processes
Invited Performance of the communication layers of TCP/IP with the Myrinet gigabit LAN
Computer Communications
Robotic clusters: Multi-robot systems as computer clusters
Robotics and Autonomous Systems
Model based performance evaluation for MPI programs
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 µs. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.