Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Realization of a programmable parallel DSP for high performance image processing applications
DAC '98 Proceedings of the 35th annual Design Automation Conference
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Instruction Set Extensions for MPEG-4 Video
Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Microprocessor Architectures: From VLIW to Tta
Microprocessor Architectures: From VLIW to Tta
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Computer
The Impact of SMT/SMP Designs on Multimedia Software Engineering " A Workload Analysis Study
MSE '02 Proceedings of the Fourth IEEE International Symposium on Multimedia Software Engineering
HiBRID-SoC: A Multi-Core System-on-Chip Architecture for Multimedia Signal Processing Applications
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
Instruction Issue Logic in Pipelined Supercomputers
IEEE Transactions on Computers
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Multicore system-on-chip architecture for MPEG-4 streaming video
IEEE Transactions on Circuits and Systems for Video Technology
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Hi-index | 0.00 |
A scalable, distributed micro-architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with out-of-order execution, that supports specialized, complex DSP function units, and simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks, forming a ”network-on-a-chip” (NOC) [1]. The communication protocol is a modified version of Tomasulo's scheme [2], that was extended to eliminate all central control structures for the data flow and to support multithreading. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.