Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
NuMesh: an architecture optimized for scheduled communication
The Journal of Supercomputing - Special issue on parallel and distributed processing
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Power and performance evaluation of globally asynchronous locally synchronous processors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Synthesis of Embedded Software from Synchronous Dataflow Specifications
Journal of VLSI Signal Processing Systems
Specifying and Compiling Applications for RaPiD
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
aSOC: A Scalable, Single-Chip Communications Architecture
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Interconnect Architecture Exploration for Low-Energy Reconfigurable Single-Chip DSPs
WVLSI '99 Proceedings of the IEEE Computer Society Workshop on VLSI'99
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A Low Power 200 MHz Multiported Register File for the Vector-IRAM Chip
A Low Power 200 MHz Multiported Register File for the Vector-IRAM Chip
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
A low-cost and low-power multi-standard video encoder
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Tile size selection for low-power tile-based architectures
Proceedings of the 3rd conference on Computing frontiers
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Synchroscalar: Evaluation of an embedded, multi-core architecture for media applications
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Architecture and Evaluation of an Asynchronous Array of Simple Processors
Journal of Signal Processing Systems
Transactions on High-Performance Embedded Architectures and Compilers I
Computers and Electrical Engineering
An efficient placement algorithm for run-time reconfigurable embedded system
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Integrated CPU cache power management in multiple clock domain processors
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
A pattern for efficient parallel computation on multicore processors with scalar operand networks
Proceedings of the 2010 Workshop on Parallel Programming Patterns
High performance, energy efficiency, and scalability with GALS chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An efficient non-blocking multithreaded embedded system
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
AFReP: application-guided function-level registerfile power-gating for embedded processors
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
We present Synchroscalar, a tile-based architecture forembedded processing that is designed to provide the flexibilityof DSPs while approaching the power efficiency ofASICs. We achieve this goal by providing high parallelismand voltage scaling while minimizing control and communicationcosts. Specifically, Synchroscalar uses columnsof processor tiles organized into statically-assignedfrequency-voltage domains to minimize power consumption.Furthermore, while columns use SIMD control to minimizeoverhead, data-dependent computations can besupported by extremely flexible statically-scheduled communicationbetween columns.We provide a detailed evaluation of Synchroscalar includingSPICE simulation, wire and device models, synthesisof key components, cycle-level simulation, andcompiler- and hand-optimized signal processing applications.We find that the goal of meeting, not exceeding, performancetargets with data-parallel applications leads todesigns that depart significantly from our intuitions derivedfrom general-purpose microprocessor design. Inparticular, synchronous design and substantial global interconnectare desirable in the low-frequency, low-powerdomain. This global interconnect supports parallelizationand reduces processor idle time, which are critical to energyefficient implementations of high bandwidth signalprocessing. Overall, Synchroscalar provides programmabilitywhile achieving power efficiencies within 8-30X ofknown ASIC implementations, which is 10-60X better thanconventional DSPs. In addition, frequency-voltage scalingin Synchroscalar provides between 3-32% power savingsin our application suite.