Increasing power efficiency of multi-core network processors through data filtering

Authors:
Gokhan Memik;William H. Mangione-Smith
Affiliations:
University of California, Los Angeles;University of California, Los Angeles
Venue:
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2002

Citing 18
Cited 5

Code generation for streaming: an access/execute mechanism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Managing data caches using selective cache line replacement

International Journal of Parallel Programming - Special issue on instruction-level parallel processing—part II
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers

25 years of the international symposia on Computer architecture (selected papers)
Power and performance tradeoffs using various caching strategies

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Smarter Memory: Improving Bandwidth for Streamed References

Computer
A Case for Intelligent RAM

IEEE Micro
Towards a Programming Environment for a Computer with Intelligent Memory

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Combined DRAM and logic chip for massively parallel systems

ARVLSI '95 Proceedings of the 16th Conference on Advanced Research in VLSI (ARVLSI'95)
Reducing Address Bus Transitions for Low Power Memory Mapping

EDTC '96 Proceedings of the 1996 European conference on Design and Test
Energy-Delay Analysis for On-Chip Interconnect at the System Level

WVLSI '99 Proceedings of the IEEE Computer Society Workshop on VLSI'99

Compiler-directed proactive power management for networks

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Conserving network processor power consumption by exploiting traffic variability

ACM Transactions on Architecture and Code Optimization (TACO)
E-AHRW: An Energy-Efficient Adaptive Hash Scheduler for Stream Processing on Multi-core Servers

Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Exploiting a computation reuse cache to reduce energy in network processors

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Efficient traffic aware power management in multicore communications processors

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose and evaluate a data filtering method to reduce the power consumption of high-end processors with multiple execution cores. Although the proposed method can be applied to a wide variety of multi-processor systems including MPPs, SMPs and any type of single-chip multiprocessor, we concentrate on Network Processors. The proposed method uses an execution unit called Data Filtering Engine that processes data with low temporal locality before it is placed on the system bus. The execution cores use locality to decide which load instructions have low temporal locality and which portion of the surrounding code should be off-loaded to the data filtering engine.Our technique reduces the power consumption, because a) the low temporal data is processed on the data filtering engine before it is placed onto the high capacitance system bus, and b) the conflict misses caused by low temporal data are reduced resulting in fewer accesses to the L2 cache. Specifically, we show that our technique reduces the bus accesses in representative applications by as much as 46.8% (26.5% on average) and reduces the overall power by as much as 15.6% (8.6% on average) on a single-core processor. It also improves the performance by as much as 76.7% (29.7% on average) for a processor with 16 execution cores.