ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
ACM SIGARCH Computer Architecture News
A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Can dataflow subsume von Neumann computing?
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Code generation for streaming: an access/execute mechanism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
WM FIFOs: Size Analysis
Implementation Independent Architectural Comparison
Implementation Independent Architectural Comparison
The WM Computer Architecture
The WM Family of computer Architectures
The WM Family of computer Architectures
The effectiveness of decoupling
ICS '93 Proceedings of the 7th international conference on Supercomputing
Compiling and optimizing for decoupled architectures
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
Techniques for extracting instruction level parallelism on MIMD architectures
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A comparison of data prefetching on an access decoupled and superscalar machine
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Multithreading decoupled architectures for complexity-effective general purpose computing
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
MediaBreeze: a decoupled architecture for accelerating multimedia applications
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Program balance and its impact on high performance RISC architectures
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Reflections on the memory wall
Proceedings of the 1st conference on Computing frontiers
Automatic Thread Extraction with Decoupled Software Pipelining
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Facilitating compiler optimizations through the dynamic mapping of alternate register structures
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Design and implementation of a queue compiler
Microprocessors & Microsystems
Efficient compilation for queue size constrained queue processors
Parallel Computing
Compiler Support for Code Size Reduction Using a Queue-Based Processor
Transactions on High-Performance Embedded Architectures and Compilers II
Code density concerns for new architectures
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Design and effectiveness of small-sized decoupled dispatch queues
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.01 |
This report describes the results of studies of the WM architecture—its performance, the values of some of its key architectural parameters, the difficulty of compiling for it, and hardware implementation complexity. The studies confirm that, with comparable chip area and without heroic compiler technology, WM is capable of outperforming traditional scalar architectures by factors of 2-9. They also underscore the need to devise higher bandwidth memory systems.