BLITZEN: a highly integrated massively parallel machine
Journal of Parallel and Distributed Computing - Massively parallel computation
Building and Using a Highly Parallel Programmable Logic Array
Computer - Special issue on experimental research in computer architecture
A scalable systolic multiprocessor system for analysis of biological sequences
Proceedings of the 1993 symposium on Research on integrated systems
Programmable active memories: reconfigurable systems come of age
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Elements of artificial neural networks
Elements of artificial neural networks
Introduction to VLSI Systems
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
An Abstract Model for a Low Cost SIMD Architecture
ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Multiprecision Division on an 8-bit Processor
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Real-Time Implementation of Full-Search Vector Quantization on a Low Memory SIMD Architecture
DCC '96 Proceedings of the Conference on Data Compression
Kestrel: A Programmable Array for Sequence Analysis
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
A recursive MISD architecture for pattern matching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The UCSC Kestrel Parallel Processor
IEEE Transactions on Parallel and Distributed Systems
Optimizing neural networks on SIMD parallel computers
Parallel Computing
Finding the Next Computational Model: Experience with the UCSC Kestrel
Journal of Signal Processing Systems
Hi-index | 0.00 |
Kestrel is a high-performance programmable parallel co-processor. Its design is the result of examination and reexamination of algorithmic, architectural, packaging, and silicon design issues, and the interrelations between them. The final system features a linear array of 8-bit processing elements, each with local memory, an arithmetic logic unit (ALU), a multiplier, and other functional units. Sixty-four Kestrel processing elements fit in a 1.4 million transistor, 60 mm^2, 0.5 micron CMOS chip with just 84 pins. The planned single-board, 8-chip system will, for some applications, provide supercomputer performance at a fraction of the cost. This paper surveys four of our applications (sequence analysis, neural networks, image compression, and floating-point arithmetic), and discusses the philosophy behind many of the design decisions. We present the processing element and system architectures, emphasizing the ALU and comparator's compact instruction encoding and design, the architecture's facility with nested conditionals, and the multiplier's flexibility in performing multiprecision operations. Finally, we discuss the implementation and performance of the Kestrel test chips.