Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
A Low-Power CAM Design for LZ Data Compression
IEEE Transactions on Computers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Energy: efficient instruction dispatch buffer design for superscalar processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Energy-efficient hybrid wakeup logic
Proceedings of the 2002 international symposium on Low power electronics and design
The Alpha 21264 Microprocessor
IEEE Micro
Use of selective precharge for low-power on the match lines of content-addressable memories
MTDT '97 Proceedings of the 1997 IEEE International Workshop on Memory Technology, Design and Testing
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
AccuPower: An Accurate Power Estimation Tool for Superscalar Microprocessors
Proceedings of the conference on Design, automation and test in Europe
Increasing design space of the instruction queue with tag coding
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
A fast, energy-efficient z-comparator
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Instruction packing: reducing power and delay of the dynamic scheduling logic
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-Efficient Wakeup Tag Broadcast
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Instruction packing: Toward fast and energy-efficient instruction scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Operand Availability for Efficient Simultaneous Multithreading
IEEE Transactions on Computers
A high-speed and EDP-efficient range-matching scheme for packet classification
IEEE Transactions on Circuits and Systems II: Express Briefs
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Reducing energy dissipation of wireless sensor processors using silent-store-filtering motecache
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Hi-index | 14.98 |
Modern superscalar datapaths use aggressive execution reordering to exploit instruction-level parallelism. Comparators, either explicit or embedded into content-addressable logic, are used extensively throughout such designs to implement several key out-of-order execution mechanisms and support the memory hierarchy. The traditional comparator designs dissipate energy on a mismatch in any bit position. As mismatches occur with a much higher frequency than matches in many situations, considerable improvements in energy dissipation are to be gained by using comparators that dissipate energy predominantly on a full match and little or no energy on partial or complete mismatches. This paper makes two contributions. First, we introduce a series of dissipate-on-match comparator designs, including designs for comparing long arguments. Second, we show how comparators, used in modern datapaths, can be chosen and organized judiciously based on the microarchitectural-level statistics to minimize the energy dissipation. We use the actual layout data and the realistic bit patterns of the comparands (obtained from the simulated execution of SPEC 2000 benchmarks) to show the energy impact from the use of the new comparator designs. For the same delay, the proposed 8-bit comparators dissipate 70 percent less energy than the traditional designs if used within issue queues and 73 percent less energy if used within load-store queues. The use of the proposed 6-bit comparators within the dependency checking logic is shown to increase the energy dissipation by 65 percent on the average compared to the traditional designs. We also find that the use of a hybrid 32-bit comparator, comprised of three traditional 8-bit blocks and one proposed 8-bit block, is the most energy-efficient solution for the use in the load-store queue, resulting in 19 percent energy reduction compared to the use of four traditional 8-bit blocks used to implement a 32-bit comparator.