An innovative low-power high-performance programmable signal processor for digital communications

Authors:
J. H. Moreno;V. Zyuban;U. Shvadron;F. D. Neeser;J. H. Derby;M. S. Ware;K. Kailas;A. Zaks;A. Geva;S. Ben-David;S. W. Asaad;T. W. Fox;D. Littrell;M. Biberstein;D. Naishlos;H. Hunter
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598;IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598
Venue:
IBM Journal of Research and Development
Year:
2003

Citing 17
Cited 12

Dependence flow graphs: an algebraic approach to program dependencies

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Gradient-based optimization of custom circuits using a static-timing formulation

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Optimization of high-performance superscalar architectures for energy efficiency

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Unified architecture level energy-efficiency metric

Proceedings of the 12th ACM Great Lakes symposium on VLSI
Fundamentals of Convolutional Coding

Fundamentals of Convolutional Coding
Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels

Proceedings of the 2002 international symposium on Low power electronics and design
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Semiconductors: the digital signal processor derby

IEEE Spectrum - IEEE medal of honor Herwig Kogelnik
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
A New Approach to DSP Intrinsic Functions

HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8
Error Control Coding, Second Edition

Error Control Coding, Second Edition
Low-power circuits and technology for wireless digital systems

IBM Journal of Research and Development

Reducing instruction fetch energy with backwards branch control information and buffering

Proceedings of the 2003 international symposium on Low power electronics and design
Vectorizing for a SIMdD DSP architecture

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A new look at exploiting data parallelism in embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Design methodology for semi custom processor cores

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Speculative software management of datapath-width for energy optimization

Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Balancing hardware intensity in microprocessor pipelines

IBM Journal of Research and Development
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

Proceedings of the 3rd conference on Computing frontiers
Auto-vectorization of interleaved data for SIMD

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Explicit data organization SIMD instruction set architecture for media processors

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Compiling for an indirect vector register architecture

Proceedings of the 5th conference on Computing frontiers
Versatility of extended subwords and the matrix register file

ACM Transactions on Architecture and Code Optimization (TACO)
MacroSS: macro-SIMDization of streaming applications

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an innovative, low-power, high-performance, programmable signal processor (DSP) for digital communications. The architecture of this processor is characterized by its explicit design for low-power implementations, its innovative ability to jointly exploit instruction-level parallelism and data-level parallelism to achieve high performance, its suitability as a target for an optimizing high-level language compiler, and its explicit replacement of hardware resources by compile-time practices. We describe the methodology used in the development of the processor, highlighting the techniques deployed to enable application/architecture/compiler/implementation co-development, and the optimization approach and metric used for power-performance evaluation and tradeoff analysis. We summarize the salient features of the architecture, provide a brief description of the hardware organization, and discuss the compiler techniques used to exercise these features. We also summarize the simulation environment and associated software development tools. Coding examples from two representative kernels in the digital communications domain are also provided. The resulting methodology, architecture, and compiler represent an advance of the state of the art in the area of low-power, domain-specific microprocessors.