Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
DPGA utilization and application
Proceedings of the 1996 ACM fourth international symposium on Field-programmable gate arrays
Communications of the ACM
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving the performance of speculatively parallel applications on the Hydra CMP
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A reconfigurable multi-function computing cache architecture
FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
Computer
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
Hardware-Software Interactions on Mpact
IEEE Micro
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Automatic Code Mapping on an Intelligent Memory Architecture
IEEE Transactions on Computers
SaarCOR: a hardware architecture for ray tracing
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
A common machine language for grid-based architectures
ACM SIGARCH Computer Architecture News
Evolving RPC for active storage
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Coping with Latency in SOC Design
IEEE Micro
A hierarchical three-way interconnect architecture for hexagonal processors
Proceedings of the 2003 international workshop on System-level interconnect prediction
A PIM-based Multiprocessor System
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Implementations of Real-time Data Intensive Applications on PIM-based Multiprocessor Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Increasing and Detecting Memory Address Congruence
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
In-memory Parallelism for Database Workloads
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Compiling Application-Specific Hardware
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
A 64Mbit Mesochronous Hybrid Wave Pipelined Multibank DRAM Macro
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
The Characterization of Data Intensive Memory Workloads on Distributed PIM Systems
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Adaptively Mapping Code in an Intelligent Memory Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
FlexCache: A Framework for Flexible Compiler Generated Data Caching
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Parallel ray tracing on a chip
Practical parallel rendering
High-level synthesis of distributed logic-memory architectures
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Programming the FlexRAM parallel intelligent memory system
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploring the VLSI Scalability of Stream Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Extending Platform-Based Design to Network on Chip Systems
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Guaranteeing the quality of services in networks on chip
Networks on chip
On packet switched networks for on-chip communication
Networks on chip
A parallel computer as a NOC region
Networks on chip
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Emerging software frameworks for exploiting Polymorphous Computing Architectures
OOPSLA '02 Companion of the 17th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Data forwarding through in-memory precomputation threads
Proceedings of the 18th annual international conference on Supercomputing
Synchroscalar: A Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor
Proceedings of the 31st annual international symposium on Computer architecture
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Vector-Thread Architecture
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
An Application Analysis Framework For Polymorphic Chip Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
PIM lite: a multithreaded processor-in-memory prototype
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Realtime ray tracing of dynamic scenes on an FPGA chip
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
A highly configurable cache for low energy embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
A High Throughput String Matching Architecture for Intrusion Detection and Prevention
Proceedings of the 32nd annual international symposium on Computer Architecture
RPU: a programmable ray processing unit for realtime ray tracing
ACM SIGGRAPH 2005 Papers
Reducing Server Data Traffic Using a Hierarchical Computation Model
IEEE Transactions on Parallel and Distributed Systems
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
The design and implementation of a low-latency on-chip network
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Constructing Virtual Architectures on a Tiled Processor
Proceedings of the International Symposium on Code Generation and Optimization
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Tile size selection for low-power tile-based architectures
Proceedings of the 3rd conference on Computing frontiers
Bit-split string-matching engines for intrusion detection and prevention
ACM Transactions on Architecture and Code Optimization (TACO)
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
Area-Performance Trade-offs in Tiled Dataflow Architectures
Proceedings of the 33rd annual international symposium on Computer Architecture
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Tartan: evaluating spatial computation for whole program execution
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A defect tolerant self-organizing nanoscale SIMD architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Real-time rendering systems in 2010
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
ALP: Efficient support for all levels of parallelism for complex media applications
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Computer Systems (TOCS)
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Core fusion: accommodating software diversity in chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
A self-organizing defect tolerant SIMD architecture
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
A low-cost mixed-mode parallel processor architecture for embedded systems
Proceedings of the 21st annual international conference on Supercomputing
Chip multi-processor generator
Proceedings of the 44th annual Design Automation Conference
A shared memory module for asynchronous arrays of processors
EURASIP Journal on Embedded Systems
Efficiency trends and limits from comprehensive microarchitectural adaptivity
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Destructive-read in embedded DRAM, impact on power consumption
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Architecture and Evaluation of an Asynchronous Array of Simple Processors
Journal of Signal Processing Systems
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Compiler Controlled Speculation for Power Aware ILP Extraction in Dataflow Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Transactions on High-Performance Embedded Architectures and Compilers I
Verification of chip multiprocessor memory systems using a relaxed scoreboard
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Loop-Aware Instruction Scheduling with Dynamic Contention Tracking for Tiled Dataflow Architectures
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Synthesis of predictable networks-on-chip-based interconnect architectures for chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A memory system design framework: creating smart memories
Proceedings of the 36th annual international symposium on Computer architecture
PLUG: flexible lookup modules for rapid deployment of new protocols in high-speed routers
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
A design methodology for domain-optimized power-efficient supercomputing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
FinFET-based power simulator for interconnection networks
ACM Journal on Emerging Technologies in Computing Systems (JETC)
ACM Transactions on Architecture and Code Optimization (TACO)
rMPI: message passing on multicore processors with on-chip interconnect
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Reconfiguration support for vector operations
International Journal of High Performance Systems Architecture
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Journal of Systems Architecture: the EUROMICRO Journal
Design and implementation of the PLUG architecture for programmable and efficient network lookups
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An experimental study of optimizing bioinformatics applications
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
CoRAM: an in-fabric memory architecture for FPGA-based computing
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The accelerator store: A shared memory framework for accelerator-based systems
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Cache write-back schemes for embedded destructive-read DRAM
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Application-aware deadlock-free oblivious routing based on extended turn-model
Proceedings of the International Conference on Computer-Aided Design
Tiled multi-core stream architecture
Transactions on High-Performance Embedded Architectures and Compilers IV
SCIN-cache: Fast speculative versioning in multithreaded cores
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
NP-SARC: Scalable network processing in the SARC multi-core FPGA platform
Journal of Systems Architecture: the EUROMICRO Journal
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual networks -- distributed communication resource management
ACM Transactions on Reconfigurable Technology and Systems (TRETS) - Special Section on 19th Reconfigurable Architectures Workshop (RAW 2012)
Toward application-specific memory reconfiguration for energy efficiency
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Dynamic microarchitectural adaptation using machine learning
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Trends in VLSI technology scaling demand that future computing devices be narrowly focused to achieve high performance and high efficiency, yet also target the high volumes and low costs of widely applicable general purpose designs. To address these conflicting requirements, we propose a modular reconfigurable architecture called Smart Memories, targeted at computing needs in the 0.1&mgr; technology generation. A Smart Memories chip is made up of many processing tiles, each containing local memory, local interconnect, and a processor core. For efficient computation under a wide class of possible applications, the memories, the wires, and the computational model can all be altered to match the applications. To show the applicability of this design, two very different machines at opposite ends of the architectural spectrum, the Imagine stream processor and the Hydra speculative multiprocessor, are mapped onto the Smart Memories computing substrate. Simulations of the mappings show that the Smart Memories architecture can successfully map these architectures with only modest performance degradation.