Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
User-space communication: a quantitative study
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
BCL-3: a high performance basic communication protocol for commodity superserver DAWNING-3000
Journal of Computer Science and Technology
Address Translation Mechanisms In Network Interfaces
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Hi-index | 0.00 |
In this paper, the impact of page size on the communication performance is studied. In the interconnection communication of cluster system, the address translation table (ATT), which is located in the memory of the network interface card (NIC) and can in a way be seen as the translation look-aside buffer (TLB) used by the NIC processor, is usually used to translate virtual address to physical address by NIC. The page size of operating system not only affects the compulsory and capacity miss rate, but also the hit time and the miss penalty of ATT in some implementations. With a large page size, we can get lower ATT miss rate, shorter hit time and miss penalty to improve the communication performance. To test the impact of the page size, a Linux module based on AMD Opteron驴 processor is implemented to allocate both normal pages and super pages and the address translation mechanism in Myrinet GM is also extended to support either normal pages or super pages. With super pages, the latency of Ping-pong test can be reduced 4.3 us and the bandwidth can improve 55.3 MB/s in some case. The Linpack test results of 11 TFLOPS Dawning 4000A show that the Linpack efficiency can be increased from 0.66% to 2.86% for different number of processors.