Compile-time partitioning and scheduling of parallel programs
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Code generation using tree matching and dynamic programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
TIERS: Topology independent pipelined routing and scheduling for VirtualWire compilation
FPGA '95 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
NuMesh: an architecture optimized for scheduled communication
The Journal of Supercomputing - Special issue on parallel and distributed processing
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Logic emulation with virtual wires
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Memory interfacing and instruction specification for reconfigurable processors
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
Computer Vision Algorithms on Reconfigurable Logic Arrays
IEEE Transactions on Parallel and Distributed Systems
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Kernel scheduling in reconfigurable computing
DATE '99 Proceedings of the conference on Design, automation and test in Europe
Performance analysis and optimization of latency insensitive systems
Proceedings of the 37th Annual Design Automation Conference
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Co-Synthesis to a Hybrid RISC/FPGA Architecture
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications
IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
KressArray Xplorer: a new CAD environment to optimize reconfigurable datapath array
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Coarse grain reconfigurable architecture (embedded tutorial)
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Register-sensitive selection, duplication, and sequencing of instructions
ICS '01 Proceedings of the 15th international conference on Supercomputing
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
A framework for reconfigurable computing: task scheduling and context management
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - System Level Design
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Computers
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Automatic Code Mapping on an Intelligent Memory Architecture
IEEE Transactions on Computers
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures
IEEE Transactions on Parallel and Distributed Systems
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
An interleaved cache clustered VLIW processor
ICS '02 Proceedings of the 16th international conference on Supercomputing
Multithreading decoupled architectures for complexity-effective general purpose computing
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Towards nanocomputer architecture
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automated design synthesis and partitioning for adaptive reconfigurable hardware
Hardware implementation of intelligent systems
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer
International Journal of Parallel Programming
Coping with Latency in SOC Design
IEEE Micro
A Polymorphous Computing Fabric
IEEE Micro
Evaluating the XMT Parallel Programming Model
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Reactive Computer Vision System with Reconfigurable Architecture
ICVS '99 Proceedings of the First International Conference on Computer Vision Systems
A Parallel-Object Programming Model for PetaFLOPS Machines and Blue Gene/Cyclops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Automatic Analysis of Loops to Exploit Operator Parallelism on Reconfigurable Systems
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiler Optimizations for Adaptive EPIC Processors
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Generation of Design Suggestions for Coarse-Grain Reconfigurable Architectures
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
Energy/Performance Design of Memory Hierarchies for Processor-in-Memory Chips
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Adaptively Mapping Code in an Intelligent Memory Architecture
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Design-Space Exploration of Low Power Coarse Grained Reconfigurable Datapath Array Architectures
PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
Dissecting Cyclops: a detailed analysis of a multithreaded architecture
ACM SIGARCH Computer Architecture News
Data communication estimation and reduction for reconfigurable systems
Proceedings of the 40th annual Design Automation Conference
Programming the FlexRAM parallel intelligent memory system
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelizing Applications into Silicon
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
PACT XPP—A Self-Reconfigurable Data Processing Architecture
The Journal of Supercomputing
Exploring the VLSI Scalability of Stream Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Mapping of generalized template matching onto reconfigurable computers
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Automatic compilation to a coarse-grained reconfigurable system-opn-chip
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Modeling technology impact on cluster microprocessor performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Polymorphous fabric-based systems: model, tools, applications
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
IEEE Transactions on Computers
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices
Proceedings of the 1st conference on Computing frontiers
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Min-cut program decomposition for thread-level speculation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Performance Efficiency of Context-Flow System-on-Chip Platform
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
When reconfigurable architecture meets network-on-chip
SBCCI '04 Proceedings of the 17th symposium on Integrated circuits and system design
A low power architecture for embedded perception
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
High-level power analysis for on-chip networks
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An architecture and compiler for scalable on-chip communication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Vector-Thread Architecture
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Scalable Processor Instruction Set Extension
IEEE Design & Test
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Compiler-directed channel allocation for saving power in on-chip networks
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Compiling for EDGE Architectures
Proceedings of the International Symposium on Code Generation and Optimization
A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Reducing NoC energy consumption through compiler-directed channel voltage scaling
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Efficient address remapping in distributed shared-memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Reconfigurable Coprocessor for Multimedia Application Domain
Journal of VLSI Signal Processing Systems
Modeling instruction placement on a spatial architecture
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A collision model for randomized routing in fat-tree networks
Journal of Parallel and Distributed Computing
Speculative thread decomposition through empirical optimization
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient control generation for mapping nested loop programs onto processor arrays
Journal of Systems Architecture: the EUROMICRO Journal
The Journal of Supercomputing
Impulse: Memory system support for scientific applications
Scientific Programming
Implementation and Evaluation of a Dynamically Routed Processor Operand Network
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Implementing DSP Algorithms with On-Chip Networks
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
Proceedings of the 21st annual international conference on Supercomputing
3D-softchip: a novel architecture for next-generation adaptive computing systems
EURASIP Journal on Applied Signal Processing
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Improving instruction level parallelism through reconfigurable units in superscalar processors
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
An energy-efficient reconfigurable baseband processor for wireless communications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Journal of Parallel and Distributed Computing
DART: a functional-level reconfigurable architecture for high energy efficiency
EURASIP Journal on Embedded Systems - Reconfigurable Computing and Hardware/Software Codesign
High performance dense linear algebra on a spatially distributed processor
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Trends toward on-chip networked microsystems
International Journal of High Performance Computing and Networking
Run-time integration of reconfigurable video processing systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Counting Dependence Predictors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Diastolic arrays: throughput-driven reconfigurable computing
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Factored operating systems (fos): the case for a scalable operating system for multicores
ACM SIGOPS Operating Systems Review
Complexity Effective Bypass Networks
Transactions on High-Performance Embedded Architectures and Compilers II
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
Mapping stream programs onto heterogeneous multiprocessor systems
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
Evaluation of OpenMP for the cyclops multithreaded architecture
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
A Superscalar software architecture model for Multi-Core Processors (MCPs)
Journal of Systems and Software
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A power-efficient network on-chip topology
Proceedings of the Fifth International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
Communications of the ACM
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CRIB: consolidated rename, issue, and bypass
Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
Proceedings of the 38th annual international symposium on Computer architecture
Effect of serialized routing resources on the implementation area of datapath circuits on FPGAS
WSEAS Transactions on Computers
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
HPC-Mesh: A Homogeneous Parallel Concentrated Mesh for Fault-Tolerance and Energy Savings
Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
L2-Cache hierarchical organizations for multi-core architectures
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
A framework for compiler driven design space exploration for embedded system customization
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Single FU bypass networks for high clock rate superscalar processors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse
Proceedings of the 49th Annual Design Automation Conference
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Bit-by-Bit Pipelined and Hybrid-Grained 2D Architecture for Motion Estimation of H.264/AVC
Journal of Signal Processing Systems
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Programming language design and analysis motivated by hardware evolution
SAS'07 Proceedings of the 14th international conference on Static Analysis
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing
Tiled-MapReduce: Efficient and Flexible MapReduce Processing on Multicore with Tiling
ACM Transactions on Architecture and Code Optimization (TACO)
Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A general constraint-centric scheduling framework for spatial architectures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
International Journal of Reconfigurable Computing - Special issue on Selected Papers from the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)
Dynamic expressivity with static optimization for streaming languages
Proceedings of the 7th ACM international conference on Distributed event-based systems
Proceedings of the First International Workshop on Many-core Embedded Systems
UNTANGLED: A Game Environment for Discovery of Creative Mapping Strategies
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 4.14 |
The most radical of the architectures that appear in this issue are Raw processors-highly parallel architectures with hundreds of very simple processors coupled to a small portion of the on-chip memory. Each processor, or tile, also contains a small bank of configurable logic, allowing synthesis of complex operations directly in configurable hardware. Unlike the others, this architecture does not use a traditional instruction set architecture. Instead, programs are compiled directly onto the Raw hardware, with all units told explicitly what to do by the compiler. The compiler even schedules most of the intertile communication. The real limitation to this architecture is the efficacy of the compiler. The authors demonstrate impressive speedups for simple algorithms that lend themselves well to this architectural model, but whether this architecture will be effective for future workloads is an open question.