A survey of power estimation techniques in VLSI circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low-power design
High-level synthesis techniques for reducing the activity of functional units
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Module assignment for low power
EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
Power macromodeling for high level power estimation
DAC '97 Proceedings of the 34th annual Design Automation Conference
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A power macromodeling technique based on power sensitivity
DAC '98 Proceedings of the 35th annual Design Automation Conference
Selective instruction compression for memory energy reduction in embedded systems
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
A static power estimation methodolodgy for IP-based design
Proceedings of the conference on Design, automation and test in Europe
IMPACT: a high-level synthesis system for low power control-flow intensive circuits
Proceedings of the conference on Design, automation and test in Europe
Exploiting operation level parallelism through dynamically reconfigurable datapaths
Proceedings of the 39th annual Design Automation Conference
Low power system scheduling and synthesis
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
The Garp Architecture and C Compiler
Computer
Imagine: Media Processing with Streams
IEEE Micro
Behavioral Synthesis for low Power
ICCS '94 Proceedings of the1994 IEEE International Conference on Computer Design: VLSI in Computer & Processors
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A MATLAB Compiler for Distributed, Heterogeneous, Reconfigurable Computing Systems
FCCM '00 Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
VLSI Design and Verification of the Imagine Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Energy-efficient instruction set synthesis for application-specific processors
Proceedings of the 2003 international symposium on Low power electronics and design
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
The design of dynamically reconfigurable datapath coprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Behavioral Synthesis of Data-Dominated Circuits for Minimal Energy Implementation
VLSID '05 Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design
An FPGA-based VLIW processor with custom hardware execution
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
EURASIP Journal on Applied Signal Processing
Overview of a compiler for synthesizing MATLAB programs onto FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
A Markov chain sequence generator for power macromodeling
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An automated, reconfigurable, low-power RFID tag
Proceedings of the 43rd annual Design Automation Conference
An automated, FPGA-based reconfigurable, low-power RFID tag
Microprocessors & Microsystems
Radio frequency identification prototyping
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A design automation and power estimation flow for RFID systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Interconnect customization for a hardware fabric
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A low-power CMOS thyristor based delay element with programmability extensions
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Automated modeling and emulation of interconnect designs for many-core chip multiprocessors
Proceedings of the 47th Design Automation Conference
Design space exploration for low-power reconfigurable fabrics
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
UNTANGLED: A Game Environment for Discovery of Creative Mapping Strategies
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
Multiprocessor Systems on Chips (MPSoCs) have become a popular architectural technique to increase performance. However, MPSoCs may lead to undesirable power consumption characteristics for computing systems that have strict power budgets, such as PDAs, mobile phones, and notebook computers. This paper presents the super-complex instruction-set computing (SuperCISC) Embedded Processor Architecture and, in particular, investigates performance and power consumption of this device compared to traditional processor architecture-based execution. SuperCISC is a heterogeneous, multicore processor architecture designed to exceed performance of traditional embedded processors while maintaining a reduced power budget compared to low-power embedded processors. At the heart of the SuperCISC processor is a multicore VLIW (Very Large Instruction Word) containing several homogeneous execution cores/functional units. In addition, complex and heterogeneous combinational hardware function cores are tightly integrated to the core VLIW engine providing an opportunity for improved performance and reduced energy consumption. Our SuperCISC processor core has been synthesized for both a 90-nm Stratix II Field Programmable Gate Aray (FPGA) and a 160-nm standard cell Application-Specific Integrated Circuit (ASIC) fabrication process from OKI, each operating at approximately 167 MHz for the VLIW core. We examine several reasons for speedup and power improvement through the SuperCISC architecture, including predicated control flow, cycle compression, and a reduction in arithmetic power consumption, which we call power compression. Finally, testing our SuperCISC processor with multimedia and signal-processing benchmarks, we show how the SuperCISC processor can provide performance improvements ranging from 7X to 160X with an average of 60X, while also providing orders of magnitude of power improvements for the computational kernels. The power improvements for our benchmark kernels range from just over 40X to over 400X, with an average savings exceeding 130X. By combining these power and performance improvements, our total energy improvements all exceed 1000X. As these savings are limited to the computational kernels of the applications, which often consume approximately 90% of the execution time, we expect our savings to approach the ideal application improvement of 10X.