Design and analysis of adaptive processor

Authors:
Shigeyuki Takano
Affiliations:
Sanyo LSI Design System Soft Co., Ltd.
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2012

Citing 43
Cited 0

Field-programmable gate arrays

Field-programmable gate arrays
Register allocation via graph coloring

Register allocation via graph coloring
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
Programmable active memories: reconfigurable systems come of age

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

25 years of the international symposia on Computer architecture (selected papers)
Some Deadlock Properties of Computer Systems

ACM Computing Surveys (CSUR)
The working set model for program behavior

Communications of the ACM
Configuration management in multi-context reconfigurable systems for simultaneous performance and power optimizations

ISSS '00 Proceedings of the 13th international symposium on System synthesis
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Configuration relocation and defragmentation for run-time reconfigurable computing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Will Physical Scalability Sabotage Performance Gains?

Computer
A Single-Chip Multiprocessor

Computer
Seeking Solutions in Configurable Computing

Computer
The Design Space of Register Renaming Techniques

IEEE Micro
Imagine: Media Processing with Streams

IEEE Micro
A Virtual Hardware Operating System for the Xilinx XC6200

FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
A time-multiplexed FPGA

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A dynamic reconfiguration run-time system

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Incremental reconfiguration for pipelined applications

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Instruction issue logic for pipelined supercomputers

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Quantitative analysis of vector code

PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
A dynamic instruction set computer

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Reconfigurable Architectures for General-Purpose Computing

Reconfigurable Architectures for General-Purpose Computing
A Media-Enhanced Vector Architecture for Embedded Memory Systems

A Media-Enhanced Vector Architecture for Embedded Memory Systems
Vector microprocessors

Vector microprocessors
Register allocation and spilling via graph coloring

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Refinement Maps for Efficient Verification of Processor Models

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
The Development of an Operating System for Reconfigurable Computing

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Functional verification of the POWER5 microprocessor and POWER5 multiprocessor systems

IBM Journal of Research and Development - POWER5 and packaging
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Guest Editors' Introduction: High-Performance Reconfigurable Computing

Computer
Characterizing the Cell EIB On-Chip Network

IEEE Micro
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Application mapping for chip multiprocessors

Proceedings of the 45th annual Design Automation Conference
Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications

Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications
A GALS many-core heterogeneous DSP platform with source-synchronous on-chip interconnection network

NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
An efficient algorithm for exploiting multiple arithmetic units

IBM Journal of Research and Development
Evaluation techniques for storage hierarchies

IBM Systems Journal
Functional verification of the POWER4 microprocessor and POWER4 multiprocessor systems

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new computation model called CACHE (Cache Architecture for Configurable Hardware Engine) is proposed in this paper. This model does not require a dedicated host processor and its software to harness the reconfiguration. Autonomous reconfiguration is performed within a working-set of application datapaths. The CACHE model has lots of side effects; caching, resource allocation and assignment, placement and routing, and defragmentation, with a processing array itself and a special register called a working-set register file. The model aims to reduce three major workloads: (1) the processor and application design workload, (2) runtime resource management and scheduling workload, and (3) reconfiguration workload. In order to reduce these workloads, processor architecture is definitely different from traditional computing model and its microprocessor architecture. There are three major ideas to construct the computing system: (1) an on-chip working-set model mainly in order to control load and store of streams, namely to control traffics introducing overheads, (2) an on-chip deadlock properties model mainly in order to manage resources and to continuously configure datapaths corresponding to a working-set window, (3) a cache memory technique to work for these models, the mechanism is equivalent to the working-set window, and the cache memory's procedure is equivalent to resource request, acquirement, and release of deadlock properties. The first model focuses onto streaming applications, for example vector and matrix operations, filters, and so on, which takes coarser grained operations such as integer operations of C-language. Regarding performance compared with DSPs, that comes from constant throughput across different scale of the applications. In addition, extended model, we call Instant model that automatically generates instance of a datapath, outperforms the DSPs. This paper shows its computation model, architecture, low-level design, and analyses about basic characteristics of the execution.