Decoupled access/execute computer architectures
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Coding guidelines for pipelined processors
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Information content of CPU memory referencing behavior
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Design of a Computer—The Control Data 6600
Design of a Computer—The Control Data 6600
Validity of the single processor approach to achieving large scale computing capabilities
AFIPS '67 (Spring) Proceedings of the April 18-20, 1967, spring joint computer conference
A Simulation Study of Decoupled Architecture Computers
IEEE Transactions on Computers
A study of scalar compilation techniques for pipelined supercomputers
ACM Transactions on Mathematical Software (TOMS)
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
Code generation for streaming: an access/execute mechanism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Memory latency effects in decoupled architectures with a single data memory module
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register requirements of pipelined processors
ICS '92 Proceedings of the 6th international conference on Supercomputing
The effectiveness of decoupling
ICS '93 Proceedings of the 7th international conference on Supercomputing
Effects of memory latencies on non-blocking processor/cache architectures
ICS '93 Proceedings of the 7th international conference on Supercomputing
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An investigation of static versus dynamic scheduling
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Improving Java performance using hardware translation
ICS '01 Proceedings of the 15th international conference on Supercomputing
Improving Latency Tolerance of Multithreading through Decoupling
IEEE Transactions on Computers
MediaBreeze: a decoupled architecture for accelerating multimedia applications
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
A Simulation Study of Decoupled Vector Architectures
The Journal of Supercomputing
Dynamic Code Partitioning for Clustered Architectures
International Journal of Parallel Programming
Memory Latency Effects in Decoupled Architectures
IEEE Transactions on Computers
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Decoupled vector architectures
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Trace Based Evaluation of Speculative Branch Decoupling
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Dual-pipeline heterogeneous ASIP design
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Lessons Learned from Model Checking a NASA Robot Controller
Formal Methods in System Design
Exploiting Coarse-Grain Verification Parallelism for Power-Efficient Fault Tolerance
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Customization of application specific heterogeneous multi-pipeline processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Hardware support for software controlled multithreading
ACM SIGARCH Computer Architecture News
Facilitating compiler optimizations through the dynamic mapping of alternate register structures
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Explicit data organization SIMD instruction set architecture for media processors
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Neural, Parallel & Scientific Computations
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA
Journal of Parallel and Distributed Computing
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
A performance-correctness explicitly-decoupled architecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
FastCrypto: parallel AES pipelines extension for general-purpose processors
Neural, Parallel & Scientific Computations
Neural, Parallel & Scientific Computations
FPGA implementation and performance evaluation of a high throughput crypto coprocessor
Journal of Parallel and Distributed Computing
TL-DAE: thread-level decoupled access/execution for OpenMP on the cyclops-64 many-core processor
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Mat-core: a decoupled matrix core extension for general-purpose processors
Neural, Parallel & Scientific Computations
Boosting mobile GPU performance with a decoupled access/execute fragment processor
Proceedings of the 39th Annual International Symposium on Computer Architecture
Runtime dependency analysis for loop pipelining in high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
A shared matrix unit for a chip multi-core processor
Journal of Parallel and Distributed Computing
Hi-index | 0.02 |