Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Early experience with message-passing on the SHRIMP multicomputer
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Memory Channel Network for PCI
IEEE Micro
Software Support for Virtual Memory-Mapped Communication
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Telegraphos: High-Performance Networking for Parallel Processing on Workstation Clusters
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Prioritizing Network Event Handling in Clusters of Workstations
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Circulating shared-registers for multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal
An efficient kernel-level blocking MPI implementation
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Hi-index | 0.00 |
Describes a mechanism for reducing the cost of waiting for messages in architectures that allow user-level communication libraries. We reduce waiting costs in two ways: by reducing the cost of servicing interrupts, and by carefully controlling when the system uses interrupts and when it uses polling. We have implemented our mechanism on the SHRIMP multicomputer and integrated it with our user-level sockets library. Experiments show that a hybrid spin-then-block strategy offers good performance in a wide variety of situations, and that speeding up the interrupt path significantly improves performance.