Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
A compilation technique for software pipelining of loops with conditional jumps
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
The computational speed of supercomputers
SIGMETRICS '83 Proceedings of the 1983 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Flexible processors: a promising application-specific processor design approach
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
A message passing coprocessor for distributed memory multicomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Architecture synthesis of high-performance application-specific processors
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Implementation optimization techniques for architecture synthesis of application-specific processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Three-dimensional finite-element analyses: implications for computer architectures
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Characterizing the behavior of sparse algorithms on caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
SPAR: A New Architecture for Large Finite Element Computations
IEEE Transactions on Computers
Vector ISA Extension for Sparse Matrix-Vector Multiplication
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Embedded DSP Processor Design: Application Specific Instruction Set Processors
Embedded DSP Processor Design: Application Specific Instruction Set Processors
Hi-index | 0.00 |
This paper presents the design and implementation of a high-performance special-purpose processor, called The White Dwarf, for accelerating finite element analysis algorithms. The White Dwarf CPU contains two Am29325 32-bit floating-point processors and one Am29332 32-bit ALU, and employs a wide-instruction word architecture in which the application algorithm is directly implemented in microcode. The entire system is VME-bus compatible and interfaces with a SUN 31160 host. The system's potential peak performance is 20 MFLOPS; a sustained computation rate in excess of 15 MFLOPS is expected. A potential speedup of between one and two orders of magnitude is possible. With a fully populated memory subsystem, the White Dwarf can accommodate finite element problems involving up to half a million nodes. The system is designed using an approach called Application-Specific Processor Design. A retargetable compiler has been developed which is capable of generating highly parallel and efficient code for the White Dwarf and other processors with similar architecture. System debuglintegration is in progress; a highly useful system is expected.