The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Instruction generation and regularity extraction for reconfigurable processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
NetBench: a benchmarking suite for network processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
High-Performance 3-1 Interlock Collapsing ALU's
IEEE Transactions on Computers
Synthesis of custom processors based on extensible platforms
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Instruction Pre-Processing in Trace Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 31st annual international symposium on Computer architecture
Evaluating the Imagine Stream Architecture
Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
SODA: A Low-power Architecture For Software Radio
Proceedings of the 33rd annual international symposium on Computer Architecture
Scalable subgraph mapping for acyclic computation accelerators
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Vector processing as an enabler for software-defined radio in handheld devices
EURASIP Journal on Applied Signal Processing
Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
From SODA to scotch: The evolution of a wireless baseband processor
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
AnySP: anytime anywhere anyway signal processing
Proceedings of the 36th annual international symposium on Computer architecture
The next generation challenge for software defined radio
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Power-Efficient Predication Techniques for Acceleration of Control Flow Execution on CGRA
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
In modern wireless devices, two broad classes of compute-intensive applications are common: those with high amounts of data-level parallelism, such as signal processing used in wireless baseband applications, and those that have little data-level parallelism, such as encryption. Wide single-instruction multiple-data (SIMD) processors have become popular for providing high performance, yet power efficient data engines for applications with abundant data parallelism. However, the non-data-parallel applications are relegated to a low-performance scalar datapath on these data engines while the SIMD resources are left idle. To accelerate both types of applications, we propose the design of a more flexible SIMD datapath called SIMD-Morph. In SIMD-Morph, code with data-level parallelism can be executed across the lanes in the traditional manner, but the lanes can be morphed into a feed-forward subgraph accelerator to execute scalar applications more efficiently. The morphed SIMD lanes form an accelerator that exploits both instruction-level parallelism as well as operation chaining to improve the performance of scalar code by exploiting the available resources in the SIMD lanes. Experimental results show that the performance impact is a 2.6X improvement for purely non-SIMD applications and a 1.4X improvement for the non-SIMD-ized portions of applications with data parallelism.