Concepts of the System/370 vector architecture
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
SIGGRAPH '93 Proceedings of the 20th annual conference on Computer graphics and interactive techniques
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The design and analysis of a cache architecture for texture mapping
Proceedings of the 24th annual international symposium on Computer architecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Vector instruction set support for conditional operations
Proceedings of the 27th annual international symposium on Computer architecture
Communications of the ACM - Special issue on computer architecture
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
CryptoManiac: a fast flexible architecture for secure communication
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
POWER4 system microarchitecture
IBM Journal of Research and Development
Cheops: a reconfigurable data-flow system for video processing
IEEE Transactions on Circuits and Systems for Video Technology
TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP
ACM Transactions on Architecture and Code Optimization (TACO)
Real-time rendering systems in 2010
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
Mat-core: a decoupled matrix core extension for general-purpose processors
Neural, Parallel & Scientific Computations
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Data-parallel programs are both growing in importanceand increasing in diversity, resulting in specialized processorstargeted at specific classes of these programs. This paperpresents a classification scheme for data-parallelprogram attributes, and proposes micro-architecturalmechanisms to support applications with diverse behaviorusing a single reconfigurable architecture. We focuson the following four broad kinds of data-parallel programs- DSP/multimedia, scientific, networking, andreal-time graphics workloads. While all of these programsexhibit high computational intensity, coarse-grainregular control behavior, and some regular memory accessbehavior, they show wide variance in the computationrequirements, fine grain control behavior, and the frequencyof other types of memory accesses. Based onthis study of application attributes, this paper proposesa set of general micro-architectural mechanismsthat enable a baseline architecture to be dynamically tailoredto the demands of a particular application. Thesemechanisms provide efficient execution across a spectrumof data-parallel applications and can be applied todiverse architectures ranging from vector cores to conventionalsuperscalar cores. Our results using a baselineTRIPS processor show that the configurability of the architectureto the application demands provides harmonicmean performance improvement of 5%-55% over scalableyet less flexible architectures, and performs competitivelyagainst specialized architectures.