IEEE Transactions on Computers
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Instruction Set Extensions for MPEG-4 Video
Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Communicating sequential processes
Communications of the ACM
Reducing wire delay penalty through value prediction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Microprocessor Architectures: From VLIW to Tta
Microprocessor Architectures: From VLIW to Tta
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems
IEEE Design & Test
The PowerPC 604 RISC microprocessor
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Imagine: Media Processing with Streams
IEEE Micro
Computer
Hierarchical Scheduling Windows
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Itanium 2 Processor Microarchitecture
IEEE Micro
The Impact of SMT/SMP Designs on Multimedia Software Engineering " A Workload Analysis Study
MSE '02 Proceedings of the Fourth IEEE International Symposium on Multimedia Software Engineering
MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Multicore system-on-chip architecture for MPEG-4 streaming video
IEEE Transactions on Circuits and Systems for Video Technology
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
Hi-index | 0.00 |
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.