A scalable, clustered SMT processor for digital signal processing

Authors:
Mladen Berekovic;Sören Moch;Peter Pirsch
Affiliations:
University of Hannover, Germany;University of Hannover, Germany;University of Hannover, Germany
Venue:
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Year:
2003

Citing 39
Cited 1

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Instruction Set Extensions for MPEG-4 Video

Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Communicating sequential processes

Communications of the ACM
Reducing wire delay penalty through value prediction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Microprocessor Architectures: From VLIW to Tta

Microprocessor Architectures: From VLIW to Tta
A design space evaluation of grid processor architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
2001 Technology Roadmap for Semiconductors

Computer
Networks on Chips: A New SoC Paradigm

Computer
Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems

IEEE Design & Test
The PowerPC 604 RISC microprocessor

IEEE Micro
Accelerating Multimedia with Enhanced Microprocessors

IEEE Micro
MMX Technology Extension to the Intel Architecture

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Imagine: Media Processing with Streams

IEEE Micro
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
The Softening of Hardware

Computer
Hierarchical Scheduling Windows

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Calisto: A Low-Power Single-Chip Multiprocessor Communications Platform

IEEE Micro
Itanium 2 Processor Microarchitecture

IEEE Micro
Hyperthreading Technology in the Netburst Microarchitecture

IEEE Micro
The Impact of SMT/SMP Designs on Multimedia Software Engineering " A Workload Analysis Study

MSE '02 Proceedings of the Fourth IEEE International Symposium on Multimedia Software Engineering
MPEG-2 Video Decompression on Simultaneous Multithreaded Multimedia Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Code Positioning to Reduce Instruction Cache Misses in Signal Processing Applications on Multimedia RISC Processors

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
Multicore system-on-chip architecture for MPEG-4 streaming video

IEEE Transactions on Circuits and Systems for Video Technology

A distributed, simultaneously multi-threaded (SMT) processor with clustered scheduling windows for scalable DSP performance

Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP

Quantified Score

Hi-index	0.00

Visualization

Abstract

A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.