The warp computer: Architecture, implementation, and performance
IEEE Transactions on Computers
MICSMACS: a VLSI programmable systolic architecture
Systolic array processors
Building and Using a Highly Parallel Programmable Logic Array
Computer - Special issue on experimental research in computer architecture
Programmable active memories: reconfigurable systems come of age
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Introduction to VLSI Systems
Massively Parallel Solutions for Molecular Sequence Analysis
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Parallel Sequence Comparison and Alignment
ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Kestrel: A Programmable Array for Sequence Analysis
ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Explicit SIMD Programming for Asynchronous Applications
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Multiprecision Division on an 8-bit Processor
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Kestrel: Design of an 8-bit SIMD Parallel Processor
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Biosequence Similarity Search on the Mercury System
ASAP '04 Proceedings of the Application-Specific Systems, Architectures and Processors, 15th IEEE International Conference
The UCSC Kestrel Parallel Processor
IEEE Transactions on Parallel and Distributed Systems
The Impact of Performance Asymmetry in Emerging Multicore Architectures
Proceedings of the 32nd annual international symposium on Computer Architecture
FPGA-based fine-grain parallel computing (abstract only)
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
Architects and industry have been searching for the next durable computational model, the next step beyond the standard CPU. Graphics co-processors, though ubiquitous and powerful, can only be effectively used on a limited range of stream-based applications. The UCSC Kestrel parallel processor is part of a continuum of parallel processing architectures, stretching from the application-specific through the application-specialized to the application-unspecific. Kestrel combines an ALU, multiplier, and local memory, with Systolic Shared Registers for seamless merging of communication and computation, and an innovative condition stack for rapid conditionals. The result has been a readily programmable and efficient co-processor for a wide range of applications, including biological sequence analysis, image processing, and irregular problems. Experience with Kestrel indicates that programmable systolic processing, and its natural combination with the Single Instruction-Multiple Data (SIMD) parallel architecture, is the most powerful, flexible, and power-efficient computational model available for a large group of applications. Unlike other approaches that try to displace or replace the standard serial processor, our model recognizes that the expansion in the application landscape and performance requirements simply imply that the most efficient solution is the combination of more than one type of processor. We propose a model in which the CPU and the GPU are complemented by "the third big chip," a massively-parallel SIMD processor.