Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Stream processor architecture
Imagine: Media Processing with Streams
IEEE Micro
Exploring the VLSI Scalability of Stream Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
A programming system for the imagine media processor
A programming system for the imagine media processor
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
The vlsi implementation and evaluation of area- and energy-efficient streaming media processors
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Memory hierarchy design for stream computing
Memory hierarchy design for stream computing
Merrimac: high-performance and highly-efficient scientific computing with streams
Merrimac: high-performance and highly-efficient scientific computing with streams
Memory and control organizations of stream processors
Memory and control organizations of stream processors
FT64: scientific computing with streams
HiPC'07 Proceedings of the 14th international conference on High performance computing
Challenges and opportunities on multi-core microprocessor
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
The Journal of Supercomputing
Hi-index | 0.00 |
Conventional stream architectures focus on exploiting ILP and DLP in the applications, although stream model also exposes abundant TLP at kernel granularity. On the other side, with the development of model VLSI technology, increasing application demands and scalability challenges conventional stream architectures. In this paper, we present a novel Tiled Multi-Core Stream Architecture called TiSA. TiSA introduces the tile that consists of multiple stream cores as a new category of architectural resources, and designed an on-chip network to support stream transfer among tiles. In TiSA, multiple levels parallelisms are exploited on different granularity of processing elements. Besides hardware modules, this paper also discusses some other key issues of TiSA architecture, including programming model, various execution patterns and resource allocations. We then evaluate the hardware scalability of TiSA by scaling to 10s~1000s ALUs and estimating its area and delay cost. We also evaluate the software scalability of TiSA by simulating 6 stream applications and comparing sustained performance with other stream processors and general purpose processors, and different configuration of TiSA. A 256-ALU TiSA with 4 tile and 4 stream cores per tile is shown to be feasible with 45 nanometer technology, sustaining 100~350 GFLOP/s on most stream benchmarks and providing ~10x of speedup over a 16-ALU TiSA with a 5% degradation in area per ALU. The result shows that TiSA is a VLSI- and performance-efficient architecture for the billions-transistors era.