A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
MIPS RISC architectures
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Specification and design of embedded systems
Specification and design of embedded systems
The energy efficiency of IRAM architectures
Proceedings of the 24th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Active disks: programming model, algorithms and evaluation
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A cost-effective, high-bandwidth storage architecture
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
IEEE Micro
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Hierarchical processors-and-memory architecture for high performance computing
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Teramac-configurable custom computing
FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)
Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Adaptively Mapping Code in an Intelligent Memory Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Cache Coherence in Intelligent Memory Systems
IEEE Transactions on Computers
Performance characteristics of MAUI: an intelligent memory system architecture
Proceedings of the 2005 workshop on Memory system performance
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
Hi-index | 0.00 |
This study compares the speed, area, and power of different implementations of Active Pages [OCS98], an intelligent memory system which helps bridge the growing gap between processor and memory performance by associating simple functions with each page of data. Previous investigations have shown up to 1000X speedups using a block of reconfigurable logic to implement these functions next to each sub-array on a DRAM chip.In this study, we show that instruction-level parallelism, not hardware specialization, is the key to the previous success with reconfigurable logic. In order to demonstrate this fact, an Active Page implementation based upon a simplified VLIW processor was developed. Unlike conventional VLIW processors, power and area constraints lead to a design which has a small number of pipeline stages. Our results demonstrate that a four-wide VLIW processor attains comparable performance to that of pure FPGA logic but requires significantly less area and power.