MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Architecture of a message-driven processor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Two fundamental issues in multiprocessing
4th International DFVLR Seminar on Foundations of Engineering Sciences on Parallel Computing in Science and Engineering
Experience with CST: programming and implementation
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Lightweight remote procedure call
ACM Transactions on Computer Systems (TOCS)
Trap architectures for Lisp systems
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
A tightly-coupled processor-network interface
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Implementing an irregular application on a distributed memory multiprocessor
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Computation migration: enhancing locality for distributed-memory parallel systems
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The shared-memory language pSather on a distributed-memory multiprocessor
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Programming models for irregular applications
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Efficient SPMD constructs for asynchronous message passing architectures
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
The CM-5 Connection Machine: a scalable supercomputer
Communications of the ACM
Recent trends in experimental operating systems research
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Issues and directions in scalable parallel computing
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving AP1000 parallel computer performance with message communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An implementation of the &egr;-relaxation algorithm on the CM-5
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Supporting sets of arbitrary connections on iWarp through communication context switches
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Object distribution in Orca using Compile-Time and Run-Time techniques
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
Super-threading: architectural and software mechanisms for optimizing parallel computation
ICS '93 Proceedings of the 7th international conference on Supercomputing
Space-efficient scheduling of multithreaded computations
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Efficient software-based fault isolation
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Developing parallel applications using high-performance simulation
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Communication and computation performance of the CM-5
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
T: integrated building blocks for parallel computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Techniques to overlap computation and communication in irregular iterative applications
ICS '94 Proceedings of the 8th international conference on Supercomputing
Programming, compilation, and resource management issues for multithreading (panel session II)
ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Processor allocation policies for message-passing parallel computers
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
METRO: a router architecture for high-performance, short-haul routing networks
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Separating data and control transfer in distributed operating systems
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing parallel programs with explicit synchronization
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating the locality benefits of active messages
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ACM Transactions on Computer Systems (TOCS)
Remote queues: exposing message queues for optimization and atomicity
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Trading packet headers for packet processing
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
STAR/MPI: binding a parallel library to interactive symbolic algebra systems
ISSAC '95 Proceedings of the 1995 international symposium on Symbolic and algebraic computation
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Serverless network file systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Exokernel: an operating system architecture for application-level resource management
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Extensibility safety and performance in the SPIN operating system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Efficient support of location transparency in concurrent object-oriented programming languages
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimizing memory system performance for communication in parallel computers
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupling synchronization and data transfer in message passing systems of parallel computers
ICS '95 Proceedings of the 9th international conference on Supercomputing
A compiler-directed distributed shared memory system
ICS '95 Proceedings of the 9th international conference on Supercomputing
ICS '95 Proceedings of the 9th international conference on Supercomputing
ICS '95 Proceedings of the 9th international conference on Supercomputing
Serverless network file systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Efficient strategies for software-only protocols in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trading packet headers for packet processing
IEEE/ACM Transactions on Networking (TON)
Cosy: an operating system for highly parallel computers
ACM SIGOPS Operating Systems Review
pHluid: the design of a parallel functional language implementation on workstations
Proceedings of the first ACM SIGPLAN international conference on Functional programming
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Early experience with message-passing on the SHRIMP multicomputer
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Reducing network latency using subpages in a global memory environment
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Fast Parallel Sorting Under LogP: Experience with the CM-5
IEEE Transactions on Parallel and Distributed Systems
Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
Strategic directions in storage I/O issues in large-scale computing
ACM Computing Surveys (CSUR) - Special ACM 50th-anniversary issue: strategic directions in computing research
Profiling and reducing processing overheads in TCP/IP
IEEE/ACM Transactions on Networking (TON)
Efficient data sharing with conditional remote memory transfers
ACM SIGARCH Computer Architecture News
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
ASHs: Application-specific handlers for high-performance messaging
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Computational Optimization and Applications
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An evaluation of bottom-up and top-down thread generation techniques
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Triplex: a multi-class routing algorithm
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
File server scaling with network-attached secure disks
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Flick: a flexible, optimizing IDL compiler
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
ASHs: application-specific handlers for high-performance messaging
IEEE/ACM Transactions on Networking (TON)
pSNOW: a tool to evaluate architectural issues for NOW environments
ICS '97 Proceedings of the 11th international conference on Supercomputing
ICS '97 Proceedings of the 11th international conference on Supercomputing
Ace: linguistic mechanisms for customizable protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance implications of communication mechanisms in all-software global address space systems
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Object graph rewriting: an experimental parallel implementation
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Modeling communication pipeline latency
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Monitoring shared virtual memory performance on a Myrinet-based PC cluster
ICS '98 Proceedings of the 12th international conference on Supercomputing
MBCF: a protected and virtualized high-speed user-level memory-based communication facility
ICS '98 Proceedings of the 12th international conference on Supercomputing
Highly efficient implementation of MPI point-to-point communication using remote memory operations
ICS '98 Proceedings of the 12th international conference on Supercomputing
Scheduling with implicit information in distributed systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
A High Performance Message-Passing System for Network of Workstations
The Journal of Supercomputing - Special issue: high performance distributed computing
Adapting the Network Interface for High-Performance Computing: The CNI Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Performance monitoring in a Myrinet-connected SHRIMP cluster
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Searching for the sorting record: experiences in tuning NOW-Sort
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
Retrospective: Monsoon: an explicit token-store architecture
25 years of the international symposia on Computer architecture (selected papers)
Virtual memory mapped network interface for the SHRIMP multicomputer
25 years of the international symposia on Computer architecture (selected papers)
The Stanford FLASH multiprocessor
25 years of the international symposia on Computer architecture (selected papers)
Tempest and typhoon: user-level shared memory
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
A distributed garbage collector with diffusion tree reorganisation and mobile objects
ICFP '98 Proceedings of the third ACM SIGPLAN international conference on Functional programming
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Space/time-efficient scheduling and execution of parallel irregular computations
ACM Transactions on Programming Languages and Systems (TOPLAS)
MultiView and Millipage — fine-grain sharing in page-based DSMs
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design challenges of virtual networks: fast, general-purpose communication
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An efficient implementation of Java's remote method invocation
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
LOTEC: a simple DSM consistency protocol for nested object transactions
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
A tale of two directories: implementing distributed shared objects in Java
JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Responsiveness without interrupts
ICS '99 Proceedings of the 13th international conference on Supercomputing
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
The design and evaluation of high performance communication using a Gigabit Ethernet
ICS '99 Proceedings of the 13th international conference on Supercomputing
Microservers: a new memory semantics for massively parallel computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
A closer look at coscheduling approaches for a network of workstations
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Transposition table driven work scheduling in distributed search
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Teapot: A Domain-Specific Language for Writing Cache Coherence Protocols
IEEE Transactions on Software Engineering
SPINE: a safe programmable and integrated network environment
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating titanium SPMD programs on the Tera MTA
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Performance prediction based loop scheduling for heterogeneous computing environment
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
Software-Based Rerouting for Fault-Tolerant Pipelined Communication
IEEE Transactions on Parallel and Distributed Systems
Using the VI architecture to build distributed, multithreaded runtime systems: a case study
SAC '00 Proceedings of the 2000 ACM symposium on Applied computing - Volume 2
High-Performance Routing in Networks of Workstations with Irregular Topology
IEEE Transactions on Parallel and Distributed Systems
On the Use of Virtual Channels in Networks of Workstations with Irregular Topology
IEEE Transactions on Parallel and Distributed Systems
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
Profiling a parallel language based on fine-grained communication
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Low-latency communication on the IBM RISC system/6000 SP
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
OMPI: optimizing MPI programs using partial evaluation
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Multimethod communication for high-performance metacomputing applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallelizing the Murϕ Verifier
Formal Methods in System Design - Special issue on CAV '97
Architectural Support for Efficient Multicasting in Irregular Networks
IEEE Transactions on Parallel and Distributed Systems
Implementation of a portable software DSM in Java
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
QoS provisioning in clusters: an investigation of Router and NIC design
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
NanoFabrics: spatial computing using molecular electronics
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
LogGPS: a parallel computational model for synchronization analysis
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing
IEEE Transactions on Parallel and Distributed Systems
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
Optimistic active messages: structuring systems for high-performance communication
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Using active messages to support shared objects
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
A new fast message passing communication system for multiprocessor workstation clusters
Progress in computer research
Efficient Java RMI for parallel programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A hierarchical load-balancing framework for dynamic multithreaded computations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
An implementation and analysis of the virtual interface architecture
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The effects of communication parameters on end performance of shared virtual memory clusters
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Multi-protocol active messages on a cluster of SMP's
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Evaluating the performance limitations of MPMD communication
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A system software architecture for high-end computing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Dynamic memory management for programmable devices
Proceedings of the 3rd international symposium on Memory management
Guaranteed: quality parallel delaunay refinement for restricted polyhedral domains
Proceedings of the eighteenth annual symposium on Computational geometry
The architecture of the DIVA processing-in-memory chip
ICS '02 Proceedings of the 16th international conference on Supercomputing
Queue pair IP: a hybrid architecture for system area networks
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Performance Analysis of Transposition-Table-Driven Work Scheduling in Distributed Search
IEEE Transactions on Parallel and Distributed Systems
A new fast message passing communication system for multiprocessor workstation clusters
Progress in computer research
Design and implementation of FMPL, a fast message-passing library for remote memory operations
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
The Network RamDisk: Using remote memory on heterogeneous NOWs
Cluster Computing
Software Architecture for Processing Clusters Based on I2O
Cluster Computing
Control strategies for parallel mixed integer branch and bound
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Efficient parallel global garbage collection on massively parallel computers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Distributed network computing over local ATM networks
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A parallel Gauss-Seidel algorithm for sparse power system matrices
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
On the design of Chant: a talking threads package
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Maté: a tiny virtual machine for sensor networks
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Evolving RPC for active storage
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Implementing Concurrent Object-Oriented Languages on Multicomputers
IEEE Parallel & Distributed Technology: Systems & Technology
Models for Asynchronous Message Handling
IEEE Parallel & Distributed Technology: Systems & Technology
Concurrency: A Case Study in Remote Tasking and Distributed IPC in Mach
IEEE Parallel & Distributed Technology: Systems & Technology
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
COMPaS: A PC-Based SMP Cluster
IEEE Concurrency
Programming Languages for CSE: The State of the Art
IEEE Computational Science & Engineering
A Case for NOW (Networks of Workstations)
IEEE Micro
Assessing Fast Network Interfaces
IEEE Micro
Client-Server Computing on Shrimp
IEEE Micro
The Virtual Interface Architecture
IEEE Micro
Reducing Overheads of Local Communications in Fine-grain Parallel Computation
ICPP '97 Proceedings of the international Conference on Parallel Processing
PACK/UNPACK on Coarse-Grained Distributed Memory Parallel Machines
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Parallel Implementations of Irregular Problems Using High-Level Actor Language
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Software Techniques for Improving MPP Bulk-Transfer Performance
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Software Support for Virtual Memory-Mapped Communication
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Exploiting the Capabilities of Communications Co-Processors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Experience with Parallel Computing on the AN2 Network
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Reducing Waiting Costs in User-Level Communication
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Optimizing Parallel Bitonic Sort
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Deadlock- and Livelock-Free Routing Protocols for Wave Switching
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Priority Based Messaging for Software Distributed Shared Memory
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Emulating PetaFLOPS Machines and Blue Gene
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
An Approach to Asynchronous Object-Oriented Parallel and Distributed Computing on Wide-Area Systems
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Multilingual Debugging Support for Data-Driven and Thread-Based Parallel Languages
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
COOL Approach to Petaflops Computing (invited paper)
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
An Efficient and Scalable Coscheduling Technique for Large Symmetric Multiprocessor Clusters
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
PCI-DDC Application Programming Interface: Performance in User-Level Messaging (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Compiling Multithreaded Java Bytecode for Distributed Execution (Distinguished Paper)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
High-Speed LANs: New Environments for Parallel and Distributed Applications
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
A Network-Centric Approach to Embedded Software for Tiny Devices
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
The Mobile Object Layer: A Run-Time Substrate for Mobile Adaptive Computations
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Demand-Driven Dataflow for Concurrent Committed-Choice Code
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
The Plan-Du Style Compilation Technique for Eager Data Transfer in Thread-Based Execution
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
EM-C: Programming with Explicit Parallelism and Locality for EM-4 Multiprocessor
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Communication and Synchronisation Using Interaction Objects
FM '99 Proceedings of the Wold Congress on Formal Methods in the Development of Computing Systems-Volume II
Memory Management in a PIM-Based Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Flexible and Optimized IDL Compilation for Distributed Applications
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
The Design for a High-Performance MPI Implementation on the Myrinet Network
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An MPI Implementation on the Top of the Virtual Interface Architecture
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Building MPI for Multi-Programming Systems Using Implicit Information
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Design and Implementation of MPI on Portals 3.0
Proceedings of the 9th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
KECho - Event Communication for Distributed Kernel Services
ARCS '02 Proceedings of the International Conference on Architecture of Computing Systems: Trends in Network and Pervasive Computing
Architecture Independent Analysis of Parallel Programs
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
The Effects of Network Contention on Processor Allocation Strategies
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Exploiting Implicit Parallelism in Functional Programs with SLAM
IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
Gilgamesh: a multithreaded processor-in-memory architecture for petaflops computing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
High-performance thread migration on clusters of SMPs
Cluster computing
A component-based approach to build a portable and flexible middleware for metacomputing
Parallel Computing - Special issue: Advanced environments for parallel and distributed computing
ACM SIGCOMM Computer Communication Review
SMiLE: an integrated, multi-paradigm software infrastructure for SCI-based clusters
Future Generation Computer Systems - Selected papers from CCGRID 2002
The nesC language: A holistic approach to networked embedded systems
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
ARMI: an adaptive, platform independent communication library
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
FTL: a multithreaded environment for parallel computation
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Executing Java threads in parallel in a distributed-memory environment
CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
A practical processor design for multithreading
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Integrating polling, interrupts, and thread management
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Fine-grain multi-thread processor architecture for massively parallel processing
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Active I/O Switches in System Area Networks
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
NYNET Communication System (NCS): A Multithreaded Message Passing Tool Over ATM Network
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
CNI: A High-Performance Network Interface for Workstation Clusters
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
The APIC Approach to High Performance Network Interface Design: Protected DMA and Other Techniques
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
A Network Co-processor-Based Approach to Scalable Media Streaming in Servers
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
PIM Architectures to Support Petaflops Level Computation in the HTMT Machine
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Locality and Performance of Page- and Object-Based DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient Fine-Grain Thread Migration with Active Threads
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
On Network CoProcessors for Scalable, Predictable Media Services
IEEE Transactions on Parallel and Distributed Systems
Sourcebook of parallel computing
Contention-Aware Communication Schedule for High-Speed Communication
Cluster Computing
High-speed I/O: the operating system as a signalling mechanism
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Transport protocols for high performance
Communications of the ACM - Blueprint for the future of high-performance networking
TOSSIM: accurate and scalable simulation of entire TinyOS applications
Proceedings of the 1st international conference on Embedded networked sensor systems
A Load Balancing Framework for Adaptive and Asynchronous Applications
IEEE Transactions on Parallel and Distributed Systems
Receiving message prediction method
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
Turning the postal system into a generic digital communication mechanism
Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications
Performance and modularity benefits of message-driven execution
Journal of Parallel and Distributed Computing
Non-strict execution in parallel and distributed computing
International Journal of Parallel Programming
A Multi-Platform Co-Array Fortran Compiler
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Journal of Systems Architecture: the EUROMICRO Journal
Predicting the Performance of Synchronous Discrete Event Simulation
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Analysis and Modeling of Advanced PIM Architecture Design Tradeoffs
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Coscheduling in Clusters: Is It a Viable Alternative?
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System
Journal of VLSI Signal Processing Systems
Region streams: functional macroprogramming for sensor networks
DMSN '04 Proceeedings of the 1st international workshop on Data management for sensor networks: in conjunction with VLDB 2004
Designing Efficient Java Communications on Clusters
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 5 - Volume 06
Impact of Page Size on Communication Performance
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
A low cost, multithreaded processing-in-memory system
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Design and Evaluation of an HPVM-Based Windows NT Supercomputer
International Journal of High Performance Computing Applications
International Journal of High Performance Computing Applications
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Reducing Server Data Traffic Using a Hierarchical Computation Model
IEEE Transactions on Parallel and Distributed Systems
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Toward a Realistic Task Scheduling Model
IEEE Transactions on Parallel and Distributed Systems
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Efficiently generating test vectors with state pruning
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
SmartApps: middle-ware for adaptive applications on reconfigurable platforms
ACM SIGOPS Operating Systems Review
Implementation and performance study of a hardware-VIA-based network adapter on gigabit ethernet
Journal of Systems Architecture: the EUROMICRO Journal
MMR: A MultiMedia Router architecture to support hybrid workloads
Journal of Parallel and Distributed Computing
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
Efficient remote block-level I/O over an RDMA-capable NIC
Proceedings of the 20th annual international conference on Supercomputing
A case for high performance computing with virtual machines
Proceedings of the 20th annual international conference on Supercomputing
Scaling MPI to short-memory MPPs such as BG/L
Proceedings of the 20th annual international conference on Supercomputing
Alert-on-update: a communication aid for shared memory multiprocessors
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
International Journal of High Performance Computing Applications
A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters
The Journal of Supercomputing
U-Net/SLE: A Java-based user-customizable virtual network interface
Scientific Programming
Flexible IDL compilation for complex communication patterns[1]
Scientific Programming
Deadlock-free scheduling of X10 computations with bounded resources
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Programming sensor networks using abstract regions
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Decentralized, adaptive resource allocation for sensor networks
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Alpine: a user-level infrastructure for network protocol development
USITS'01 Proceedings of the 3rd conference on USENIX Symposium on Internet Technologies and Systems - Volume 3
Nomad: migrating OS-bypass networks in virtual machines
Proceedings of the 3rd international conference on Virtual execution environments
WSDLite: a lightweight alternative to windows sockets direct path
WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
An application experience with an implicitly parallel composition language
VHLLS'94 Proceedings of the USENIX 1994 Very High Level Languages Symposium Proceedings on USENIX 1994 Very High Level Languages Symposium Proceedings
Message-driven relaxed consistency in a software distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Cooperative caching: using remote client memory to improve file system performance
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Coordinated thread scheduling for workstation clusters under windows NT
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Experience with a language for writing coherence protocols
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
An object-oriented communication mechanism for parallel systems
COOTS'96 Proceedings of the 2nd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 2
An application adaptation layer for wireless sensor networks
Pervasive and Mobile Computing
SLIC: an extensibility system for commodity operating systems
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
An extensible protocol architecture for application-specific networking
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Implementation of a reliable remote memory pager
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
FLIPC: a low latency messaging system for distributed real time environments
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Efficient user-level thread migration and checkpointing on windows NT clusters
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Charisma: orchestrating migratable parallel objects
Proceedings of the 16th international symposium on High performance distributed computing
Proceedings of the 21st annual international conference on Supercomputing
Parallel Languages and Compilers: Perspective From the Titanium Experience
International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
10 network papers that changed the world
ACM SIGCOMM Computer Communication Review
Towards an active network architecture
ACM SIGCOMM Computer Communication Review
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs
IEEE Transactions on Parallel and Distributed Systems
Tapping into the fountain of CPUs: on operating system support for programmable devices
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Implications of application usage characteristics for collective communication offload
International Journal of High Performance Computing and Networking
Scalable barrier synchronisation for large-scale shared-memory multiprocessors
International Journal of High Performance Computing and Networking
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
RISC: A resilient interconnection network for scalable cluster storage systems
Journal of Systems Architecture: the EUROMICRO Journal
Cronus: A platform for parallel code generation based on computational geometry methods
Journal of Systems and Software
Research note: On the assessment of input streams for incremental network computing
Journal of Parallel and Distributed Computing
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Runtime optimization of vector operations on large scale SMP clusters
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Specifying and Verifying Sensor Networks: An Experiment of Formal Methods
ICFEM '08 Proceedings of the 10th International Conference on Formal Methods and Software Engineering
Efficient, portable implementation of asynchronous multi-place programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Pleiad: a cross-environment middleware providing efficient multithreading on clusters
Proceedings of the 6th ACM conference on Computing frontiers
TakTuk, adaptive deployment of remote executions
Proceedings of the 18th ACM international symposium on High performance distributed computing
Multicore Scheduling for Lightweight Communicating Processes
COORDINATION '09 Proceedings of the 11th International Conference on Coordination Models and Languages
A packet-switched network architecture for reconfigurable computing
ACM Transactions on Embedded Computing Systems (TECS)
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency
Proceedings of the 46th Annual Design Automation Conference
A new ultra-low latency message transfer mechanism
CSN '07 Proceedings of the Sixth IASTED International Conference on Communication Systems and Networks
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
IBM Journal of Research and Development
An integral approach to programming sensor networks
CCNC'09 Proceedings of the 6th IEEE Conference on Consumer Communications and Networking Conference
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Thread migration in a parallel graph reducer
IFL'02 Proceedings of the 14th international conference on Implementation of functional languages
Scalable multithreading in a low latency Myrinet cluster
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
AM++: a generalized active message framework
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An introduction to Balder: an OpenMP run-time library for clusters of SMPs
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Tailoring a self-distributing architecture to a cluster computer environment
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
PVM application-level tuning over ATM
EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Comet: an active distributed key-value store
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
High-performance message-passing over generic Ethernet hardware with Open-MX
Parallel Computing
Parallel and distributed computing on multidomain non-routable networks
International Journal of High Performance Computing and Networking
A multi-protocol communication architecture for metacomputing
ICCOM'06 Proceedings of the 10th WSEAS international conference on Communications
Active pebbles: parallel programming for data-driven applications
Proceedings of the international conference on Supercomputing
A moving threads processor architecture MTPA
The Journal of Supercomputing
Hybrid PGAS runtime support for multicore nodes
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
An open-source compiler and runtime implementation for Coarray Fortran
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Designing a common communication subsystem
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An integrated architecture for qos-enable router and grid-oriented supercomputer
ICCNMC'05 Proceedings of the Third international conference on Networking and Mobile Computing
Paradis-Net: a network interface for parallel and distributed
ICN'05 Proceedings of the 4th international conference on Networking - Volume Part II
Super-Scalable algorithms for computing on 100,000 processors
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I
Heterogeneous integration to simplify many-core architecture simulations
Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Multicore scheduling for lightweight communicating processes
Science of Computer Programming
Exploiting multidomain non routable networks
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
GPU programming in a high level language: compiling X10 to CUDA
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
BLAST: broadband lightweight ATM secure transport for high-performance distributed computing
Computer Communications
Modelling and analysis of communication overhead for parallel matrix algorithms
Mathematical and Computer Modelling: An International Journal
Data-driven fault tolerance for work stealing computations
Proceedings of the 26th ACM international conference on Supercomputing
Composable, non-blocking collective operations on power7 IH
Proceedings of the 26th ACM international conference on Supercomputing
Global Futures: A Multithreaded Execution Model for Global Arrays-based Applications
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Avalanche: a fine-grained flow graph model for irregular applications on distributed-memory systems
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
The Journal of Supercomputing
Improving communication latency with the write-only architecture
Journal of Parallel and Distributed Computing
A high-productivity task-based programming model for clusters
Concurrency and Computation: Practice & Experience
An efficient kernel-level blocking MPI implementation
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Expressing graph algorithms using generalized active messages
Proceedings of the 27th international ACM conference on International conference on supercomputing
APE: accelerator processor extensions to optimize data-compute co-location
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Active data: a data-centric approach to data life-cycle management
PDSW '13 Proceedings of the 8th Parallel Data Storage Workshop
Portable, MPI-interoperable coarray fortran
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.02 |
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high communication costs. Research prototypes of message driven machines demonstrate low communication overhead, but poor processor cost/performance. We introduce a simple communication mechanism, Active Messages, show that it is intrinsic to both architectures, allows cost effective use of the hardware, and offers tremendous flexibility. Implementations on nCUBE/2 and CM-5 are described and evaluated using a split-phase shared-memory extension to C, Split-C. We further show that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed. With this mechanism, latency tolerance becomes a programming/compiling concern. Hardware support for active messages is desirable and we outline a range of enhancements to mainstream processors.