Dependence flow graphs: an algebraic approach to program dependencies
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Simulation/evaluation environment for a VLIW processor architecture
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Gradient-based optimization of custom circuits using a static-timing formulation
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Optimization of high-performance superscalar architectures for energy efficiency
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Unified architecture level energy-efficiency metric
Proceedings of the 12th ACM Great Lakes symposium on VLSI
Fundamentals of Convolutional Coding
Fundamentals of Convolutional Coding
Proceedings of the 2002 international symposium on Low power electronics and design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Semiconductors: the digital signal processor derby
IEEE Spectrum - IEEE medal of honor Herwig Kogelnik
A New Approach to DSP Intrinsic Functions
HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 8 - Volume 8
Error Control Coding, Second Edition
Error Control Coding, Second Edition
Low-power circuits and technology for wireless digital systems
IBM Journal of Research and Development
Reducing instruction fetch energy with backwards branch control information and buffering
Proceedings of the 2003 international symposium on Low power electronics and design
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A new look at exploiting data parallelism in embedded systems
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Design methodology for semi custom processor cores
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Speculative software management of datapath-width for energy optimization
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Balancing hardware intensity in microprocessor pipelines
IBM Journal of Research and Development
VICTORIA: VMX indirect compute technology oriented towards in-line acceleration
Proceedings of the 3rd conference on Computing frontiers
Auto-vectorization of interleaved data for SIMD
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Explicit data organization SIMD instruction set architecture for media processors
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Compiling for an indirect vector register architecture
Proceedings of the 5th conference on Computing frontiers
Versatility of extended subwords and the matrix register file
ACM Transactions on Architecture and Code Optimization (TACO)
MacroSS: macro-SIMDization of streaming applications
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We describe an innovative, low-power, high-performance, programmable signal processor (DSP) for digital communications. The architecture of this processor is characterized by its explicit design for low-power implementations, its innovative ability to jointly exploit instruction-level parallelism and data-level parallelism to achieve high performance, its suitability as a target for an optimizing high-level language compiler, and its explicit replacement of hardware resources by compile-time practices. We describe the methodology used in the development of the processor, highlighting the techniques deployed to enable application/architecture/compiler/implementation co-development, and the optimization approach and metric used for power-performance evaluation and tradeoff analysis. We summarize the salient features of the architecture, provide a brief description of the hardware organization, and discuss the compiler techniques used to exercise these features. We also summarize the simulation environment and associated software development tools. Coding examples from two representative kernels in the digital communications domain are also provided. The resulting methodology, architecture, and compiler represent an advance of the state of the art in the area of low-power, domain-specific microprocessors.