Field-programmable gate arrays
Field-programmable gate arrays
Register allocation via graph coloring
Register allocation via graph coloring
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
Programmable active memories: reconfigurable systems come of age
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Some Deadlock Properties of Computer Systems
ACM Computing Surveys (CSUR)
The working set model for program behavior
Communications of the ACM
ISSS '00 Proceedings of the 13th international symposium on System synthesis
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Configuration relocation and defragmentation for run-time reconfigurable computing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Computer
Imagine: Media Processing with Streams
IEEE Micro
A Virtual Hardware Operating System for the Xilinx XC6200
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A dynamic reconfiguration run-time system
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Incremental reconfiguration for pipelined applications
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Instruction issue logic for pipelined supercomputers
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Quantitative analysis of vector code
PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
A dynamic instruction set computer
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Reconfigurable Architectures for General-Purpose Computing
Reconfigurable Architectures for General-Purpose Computing
A Media-Enhanced Vector Architecture for Embedded Memory Systems
A Media-Enhanced Vector Architecture for Embedded Memory Systems
Vector microprocessors
Register allocation and spilling via graph coloring
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Refinement Maps for Efficient Verification of Processor Models
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
The Development of an Operating System for Reconfigurable Computing
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Functional verification of the POWER5 microprocessor and POWER5 multiprocessor systems
IBM Journal of Research and Development - POWER5 and packaging
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the Cell EIB On-Chip Network
IEEE Micro
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Application mapping for chip multiprocessors
Proceedings of the 45th annual Design Automation Conference
Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications
Introduction to Reconfigurable Computing: Architectures, Algorithms, and Applications
A GALS many-core heterogeneous DSP platform with source-synchronous on-chip interconnection network
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Evaluation techniques for storage hierarchies
IBM Systems Journal
Functional verification of the POWER4 microprocessor and POWER4 multiprocessor systems
IBM Journal of Research and Development
Hi-index | 0.00 |
A new computation model called CACHE (Cache Architecture for Configurable Hardware Engine) is proposed in this paper. This model does not require a dedicated host processor and its software to harness the reconfiguration. Autonomous reconfiguration is performed within a working-set of application datapaths. The CACHE model has lots of side effects; caching, resource allocation and assignment, placement and routing, and defragmentation, with a processing array itself and a special register called a working-set register file. The model aims to reduce three major workloads: (1) the processor and application design workload, (2) runtime resource management and scheduling workload, and (3) reconfiguration workload. In order to reduce these workloads, processor architecture is definitely different from traditional computing model and its microprocessor architecture. There are three major ideas to construct the computing system: (1) an on-chip working-set model mainly in order to control load and store of streams, namely to control traffics introducing overheads, (2) an on-chip deadlock properties model mainly in order to manage resources and to continuously configure datapaths corresponding to a working-set window, (3) a cache memory technique to work for these models, the mechanism is equivalent to the working-set window, and the cache memory's procedure is equivalent to resource request, acquirement, and release of deadlock properties. The first model focuses onto streaming applications, for example vector and matrix operations, filters, and so on, which takes coarser grained operations such as integer operations of C-language. Regarding performance compared with DSPs, that comes from constant throughput across different scale of the applications. In addition, extended model, we call Instant model that automatically generates instance of a datapath, outperforms the DSPs. This paper shows its computation model, architecture, low-level design, and analyses about basic characteristics of the execution.