Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Communications of the ACM
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Parallel Processing: From Applications to Systems
Parallel Processing: From Applications to Systems
DSP Processors Hit the Mainstream
Computer
A new parallel DSP with short-vector memory architecture
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An interleaved cache clustered VLIW processor
ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Embedded processor design challenges
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Interface Design Techniques for Single-Chip Systems
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Instruction Replication for Clustered Microarchitectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
Proceedings of the 1st conference on Computing frontiers
Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs
Proceedings of the 31st annual international symposium on Computer architecture
Efficient orchestration of sub-word parallelism in media processors
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Removing communications in clustered microarchitectures through instruction replication
ACM Transactions on Architecture and Code Optimization (TACO)
Demystifying on-the-fly spill code
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Future wireless convergence platforms
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
A Low-Power Multithreaded Processor for Software Defined Radio
Journal of VLSI Signal Processing Systems
Virtual Cluster Scheduling Through the Scheduling Graph
Proceedings of the International Symposium on Code Generation and Optimization
Heterogeneous Clustered VLIW Microarchitectures
Proceedings of the International Symposium on Code Generation and Optimization
Vector processing as an enabler for software-defined radio in handheld devices
EURASIP Journal on Applied Signal Processing
An integrated ARM and multi-core DSP simulator
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
High-performance and low-power VLIW cores for numerical computations
International Journal of High Performance Computing and Networking
Configurable data memory for multimedia processing
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
From SODA to scotch: The evolution of a wireless baseband processor
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Trends in low power handset software defined radio
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Compiler-assisted power optimization for clustered VLIW architectures
Parallel Computing
A low-power DSP for wireless communications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analyzing the Next Generation Software Defined Radio for Future Architectures
Journal of Signal Processing Systems
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.01 |
This highly parallel DSP architecture based on a short-vector memory system incorporates techniques found in general-purpose computing. It promises sustained performance close to its peak computational rates of 900 Mflops (32-bit floating-point) or 3.6 BOPS (16-bit fixed-point).