Computer networks
PVM: a framework for parallel distributed computing
Concurrency: Practice and Experience
Internetworking with TCP/IP (2nd ed.), vol. I
Internetworking with TCP/IP (2nd ed.), vol. I
High-performance switching with fibre channel
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Fbufs: a high-bandwidth cross-domain transfer facility
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
A Case for NOW (Networks of Workstations)
IEEE Micro
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
The VuNet desk area network: architecture, implementation, and experience
IEEE Journal on Selected Areas in Communications
How inefficient is IP over ATM anyway?
IEEE Network: The Magazine of Global Internetworking
The impact of a zero-scan Internet checksumming mechanism
ACM SIGCOMM Computer Communication Review
Multicasting protocols for high-speed, wormhole-routing local area networks
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
pSNOW: a tool to evaluate architectural issues for NOW environments
ICS '97 Proceedings of the 11th international conference on Supercomputing
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Performance evaluation of the Orca shared-object system
ACM Transactions on Computer Systems (TOCS)
Modeling communication pipeline latency
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
MBCF: a protected and virtualized high-speed user-level memory-based communication facility
ICS '98 Proceedings of the 12th international conference on Supercomputing
Implementation of reductions in support of PDES on a network of workstations
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Shared Memory Programming in Metacomputing Environments: The Global Array Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
A High Performance Message-Passing System for Network of Workstations
The Journal of Supercomputing - Special issue: high performance distributed computing
Performance monitoring in a Myrinet-connected SHRIMP cluster
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design challenges of virtual networks: fast, general-purpose communication
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient implementation of Java's remote method invocation
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Interfacing Java to the virtual interface architecture
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
The design and evaluation of high performance communication using a Gigabit Ethernet
ICS '99 Proceedings of the 13th international conference on Supercomputing
A closer look at coscheduling approaches for a network of workstations
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Security versus performance tradeoffs in RPC implementations for safe language systems
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Efficient kernel support for reliable communication
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
A simulation-based study of scheduling mechanisms for a dynamic cluster environment
Proceedings of the 14th international conference on Supercomputing
Evaluating design alternatives for reliable communication on high-speed networks
ACM SIGPLAN Notices
Multimethod communication for high-performance metacomputing applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A scaled version of the elastic time algorithm
Proceedings of the fifteenth workshop on Parallel and distributed simulation
Architectural Support for Efficient Multicasting in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems
ICS '01 Proceedings of the 15th international conference on Supercomputing
Scheduling best-effort and real-time pipelined applications on time-shared clusters
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Evaluating design alternatives for reliable communication on high-speed networks
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
QoS provisioning in clusters: an investigation of Router and NIC design
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing
IEEE Transactions on Parallel and Distributed Systems
Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms
IEEE Transactions on Parallel and Distributed Systems
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Highly efficient gang scheduling implementation
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A system software architecture for high-end computing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Dynamic memory management for programmable devices
Proceedings of the 3rd international symposium on Memory management
Queue pair IP: a hybrid architecture for system area networks
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Early cancellation: an active NIC optimization for time-warp
Proceedings of the sixteenth workshop on Parallel and distributed simulation
Conditional checkpoint abort: an alternative semantic for re-synchronization in CCL
Proceedings of the sixteenth workshop on Parallel and distributed simulation
Proceedings of the 33nd conference on Winter simulation
Design and implementation of FMPL, a fast message-passing library for remote memory operations
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Y-Invalidate: A New Protocol for Implementing Weak Consistency in DSM Systems
International Journal of Parallel Programming
Models for Asynchronous Message Handling
IEEE Parallel & Distributed Technology: Systems & Technology
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
Client-Server Computing on Shrimp
IEEE Micro
On the processor scheduling problem in time warp synchronization
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Can User-Level Protocols Take Advantage of Multi-CPU NICs?
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Ultra-high performance communication with MPI and the Sun fire™ link interconnect
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Firmware-Level Latency Analysis on a Gigabit Network
The Journal of Supercomputing
CCL v3.0: Multiprogrammed Semi-Asynchronous Checkpoints
Proceedings of the seventeenth workshop on Parallel and distributed simulation
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation
IEEE Transactions on Parallel and Distributed Systems
On Network CoProcessors for Scalable, Predictable Media Services
IEEE Transactions on Parallel and Distributed Systems
Sourcebook of parallel computing
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
Coscheduling in Clusters: Is It a Viable Alternative?
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Design and Evaluation of an HPVM-Based Windows NT Supercomputer
International Journal of High Performance Computing Applications
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Performance of optimized software implementation of the iSCSI protocol
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Scaling MPI to short-memory MPPs such as BG/L
Proceedings of the 20th annual international conference on Supercomputing
A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters
The Journal of Supercomputing
U-Net/SLE: A Java-based user-customizable virtual network interface
Scientific Programming
Multiprogrammed non-blocking checkpoints in support of optimistic simulation on myrinet clusters
Journal of Systems Architecture: the EUROMICRO Journal
Nomad: migrating OS-bypass networks in virtual machines
Proceedings of the 3rd international conference on Virtual execution environments
Ensuring e-Transaction with Asynchronous and Uncoordinated Application Server Replicas
IEEE Transactions on Parallel and Distributed Systems
The Journal of Supercomputing
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
High performance and scalable I/O virtualization via self-virtualized devices
Proceedings of the 16th international symposium on High performance distributed computing
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 22nd annual international conference on Supercomputing
Hi-index | 0.00 |
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical structure of processing communication and memory name-space management resources to provide a scalableNUMA environment. Ensembles of 8 HP PA-RISC7100 microprocessorsemploy an internal cross-bar switch and directory based cache coherence scheme to provide a tightly coupled SMP.Up to 16 processing ensembles are interconnected by a 4 ring network incorporating a full hardware implementation of the SCI protocol for a full system configuration of 128 processors. This paper presents the findings of a set of empirical studies using both synthetic test codes and full applications for the Earth and space sciences to characterize the performance properties of this new architecture. It is shown that overhead and latencies of global primitive mechanisms, while low in absolute time, are significantly more costly than similar functions local to an individual processor ensemble.