Finding the Next Computational Model: Experience with the UCSC Kestrel

Authors:
Richard Hughey;Andrea Blas
Affiliations:
Department of Computer Engineering, University of California, Santa Cruz, USA 95064;Department of Computer Engineering, University of California, Santa Cruz, USA 95064
Venue:
Journal of Signal Processing Systems
Year:
2008

Citing 17
Cited 1

P-NAC: A Systolic Array for Comparing Nucleic Acid Sequences

Computer
The warp computer: Architecture, implementation, and performance

IEEE Transactions on Computers
MICSMACS: a VLSI programmable systolic architecture

Systolic array processors
Building and Using a Highly Parallel Programmable Logic Array

Computer - Special issue on experimental research in computer architecture
Programmable active memories: reconfigurable systems come of age

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering

Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Introduction to VLSI Systems

Introduction to VLSI Systems
Massively Parallel Solutions for Molecular Sequence Analysis

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Parallel Sequence Comparison and Alignment

ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Kestrel: A Programmable Array for Sequence Analysis

ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Explicit SIMD Programming for Asynchronous Applications

ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Multiprecision Division on an 8-bit Processor

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Kestrel: Design of an 8-bit SIMD Parallel Processor

ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Biosequence Similarity Search on the Mercury System

ASAP '04 Proceedings of the Application-Specific Systems, Architectures and Processors, 15th IEEE International Conference
The UCSC Kestrel Parallel Processor

IEEE Transactions on Parallel and Distributed Systems
The Impact of Performance Asymmetry in Emerging Multicore Architectures

Proceedings of the 32nd annual international symposium on Computer Architecture
Using FPGA Devices to Accelerate Biomolecular Simulations

Computer

FPGA-based fine-grain parallel computing (abstract only)

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Architects and industry have been searching for the next durable computational model, the next step beyond the standard CPU. Graphics co-processors, though ubiquitous and powerful, can only be effectively used on a limited range of stream-based applications. The UCSC Kestrel parallel processor is part of a continuum of parallel processing architectures, stretching from the application-specific through the application-specialized to the application-unspecific. Kestrel combines an ALU, multiplier, and local memory, with Systolic Shared Registers for seamless merging of communication and computation, and an innovative condition stack for rapid conditionals. The result has been a readily programmable and efficient co-processor for a wide range of applications, including biological sequence analysis, image processing, and irregular problems. Experience with Kestrel indicates that programmable systolic processing, and its natural combination with the Single Instruction-Multiple Data (SIMD) parallel architecture, is the most powerful, flexible, and power-efficient computational model available for a large group of applications. Unlike other approaches that try to displace or replace the standard serial processor, our model recognizes that the expansion in the application landscape and performance requirements simply imply that the most efficient solution is the combination of more than one type of processor. We propose a model in which the CPU and the GPU are complemented by "the third big chip," a massively-parallel SIMD processor.