A performance analysis of automatically managed top of stack buffers
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
Transputer reference manual
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
High-speed top-of-stack scheme for VLSI processor: a management algorithm and its analysis
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Reduced instruction set computer architectures for vlsi (microprocessor, risc, multiple-windows - of - registers)
Hi-index | 0.00 |
Many research and application fields are today computationally very demanding. This requirement has influenced the development of high-performance processors, vector processors and multiprocessor engines. This paper describes the development of the multiprocessing system FLIP-FLOP (Fast Link Periphery for a Forth Language Oriented Processor) which combines a stack oriented processor kernel and a communication coprocessor for message passing.In the past many muliprocessor systems based on available microprocessors have been built, for example the iPSC or the BBN Butterfly. On the other hand special processors were constructed for building multiprocessing systems like the INMOS Transputer [1]. The FLIP-FLOP system belongs to the second class and there are several similarities between FLIP-FLOP and the Transputers. The kernel of FLIP-FLOP uses a stack as a central data structure, but this stack is potentially unlimited. All data manipulations use this stack. One additional stack is available for handling adresses (subroutines, loops). Due to the independence between both stack many operations can take place in parallel. An example is the possibility of overlapping a subroutine return with an arbitrary other instruction. This dual stack architecture directly supports the stack oriented language FORTH and as it will be shown leads to high performance using small chip area.Therefore it is possible to integrate the communication coprocessor FLIP together with FLOP on one chip. FLIP was designed to be as simple as possible without loosing performance in message passing. This "RISC philosophy" is nevertheless safe against deadlocks. Messages are sent as packets and are routed completely by the hardware over up to 16 links without affecting the FLOP part. One FLIP-FLOP involves four links (dedicated connections between two processors) which are able to transfer byte parallel data in both directions at the same time (duplex mode). Using a clocking frequency of 10 MHz up to 40 MByte of data can be transferred. Incoming messages are stored in the main memory via DMA into a special adress region (input mail-box), outgoing messages are taken from another adress region (output mailbox). This especially supports algorithms using asynchronous message exchange. One example algorithm is Time Warp driven discrete distributed simulation.It will be shown that the stack architecture and the message passing principle are working well together. This leads to a short response time to incoming messages. The RISC like design of both parts FLIP and FLOP requires small chip area and makes it possible to implement a whole FLIP-FLOP system with a VLSI semi-custom design system onto one chip.