The Stanford Dash Multiprocessor
Computer
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The design of RPM: an FPGA-based multiprocessor emulator
FPGA '95 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays
The design of RPM: an FPGA-based multiprocessor emulator
FPGA '95 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays
Verification techniques for cache coherence protocols
ACM Computing Surveys (CSUR)
ICS '97 Proceedings of the 11th international conference on Supercomputing
Retrospective: memory access buffering in multiprocessors
25 years of the international symposia on Computer architecture (selected papers)
Rapid Hardware Prototyping on RPM-2
IEEE Design & Test
Prototyping Framework for Reconfigurable Processors
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Design, implementation, and verification of active cache emulator (ACE)
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
HMTT: a platform independent full-system memory trace monitoring system
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A performance methodology for commercial servers
IBM Journal of Research and Development
Performance of large low-associativity caches
ACM SIGMETRICS Performance Evaluation Review
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
HMTT: A hybrid hardware/software tracing system for bridging the DRAM access trace's semantic gap
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 4.10 |
In multiprocessor systems, processing nodes contain a processor, some cache memory, and a share of the system memory, and connect through a scalable interconnection. The system memory partitions may be shared or disjoint (message passing). Within each class of systems, many architectural variations are possible. Fair comparisons among systems are difficult without a common hardware platform to implement the different architectures. RPM (Rapid Prototyping engine for Multiprocessors), a hardware emulator for the rapid prototyping of various multiprocessor architectures, provides this platform. The authors describe its architecture, performance, and prototyping methodology. Reprogrammable controllers implemented with field-programmable gate arrays emulate the target machine's hardware. The processors, memories, and interconnections are off the shelf, and their relative speeds can be modified to emulate various component technologies. The authors also compare RPM with other rapid prototyping approaches. Because emulation is orders-of-magnitude faster than simulation, an emulator can run problems with large data sets more representative of the workloads for which the target machine is designed. An emulator can also accomplish more reliable performance evaluation and design. Finally, because an emulator is a real computer with its own I/O, every emulation is an actual incarnation of the target and can run several different workloads.