Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
NuMesh: an architecture optimized for scheduled communication
The Journal of Supercomputing - Special issue on parallel and distributed processing
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Power and performance evaluation of globally asynchronous locally synchronous processors
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Software Synthesis from Dataflow Graphs
Software Synthesis from Dataflow Graphs
Synthesis of Embedded Software from Synchronous Dataflow Specifications
Journal of VLSI Signal Processing Systems
Rational clocking [digital systems design]
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
Stream Computations Organized for Reconfigurable Execution (SCORE)
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
aSOC: A Scalable, Single-Chip Communications Architecture
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Interconnect Architecture Exploration for Low-Energy Reconfigurable Single-Chip DSPs
WVLSI '99 Proceedings of the IEEE Computer Society Workshop on VLSI'99
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Hi-index | 0.00 |
Embedded devices have hard performance targets and severe power and area constraints that depart significantly from our design intuitions derived from general-purpose microprocessor design. This paper describes our initial experiences in designing Synchroscalar, a tile-based embedded architecture targeted for multi-rate signal processing applications. We present a preliminary design of the Synchroscalar architecture and some design space exploration in the context of important signal processing kernels. In particular, we find that synchronous design and substantial global interconnect are desirable in the low-frequency, low-power domain. This global interconnect enables parallelization and reduces processor idle time, which are critical to energy efficient implementations of high bandwidth signal processing. Furthermore, statically-scheduled communication and SIMD computation keep control overheads low and energy efficiency high.