L-CBF: a low-power, fast counting bloom filter architecture

Authors:
Elham Safi;Andreas Moshovos;Andreas Veneris
Affiliations:
University of Toronto;University of Toronto;University of Toronto
Venue:
Proceedings of the 2006 international symposium on Low power electronics and design
Year:
2006

Citing 7
Cited 2

Built-in test for VLSI: pseudorandom techniques

Built-in test for VLSI: pseudorandom techniques
Bloom filtering cache misses for accurate data speculation and prefetching

ICS '02 Proceedings of the 16th international conference on Supercomputing
Synchronous Up/Down Counter with Clock Period Independent of Counter Size

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Low-Power SRAM Circuit Design

MTDT '99 Proceedings of the 1999 IEEE International Workshop on Memory Technology, Design, and Testing
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Scalable Hardware Memory Disambiguation for High-ILP Processors

IEEE Micro
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture

Late-binding: enabling unordered load-store queues

Proceedings of the 34th annual international symposium on Computer architecture
L-CBF: a low-power, fast counting bloom filter architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the energy, latency and area characteristics of two Counting Bloom Filter implementations using full custom layouts in a commercial 0.13μm technology. The first implementation, S-CBF, uses an SRAM array of counts and a shared counter. The second, L-CBF, utilizes an array of up/down linear feedback shift registers. Circuit level simulations demonstrate that for a 1K-entry CBF with a 15-bit count per entry, L-CBF is 3.7 or 1.6 times faster than the S-CBF depending on the operation. The L-CBF requires 2.3 or 1.4 times less energy per operation compared to the S-CBF. However, the L-CBF requires 3.2 times more area. We demonstrate that for one application of CBFs (early hit/miss detection for L1 caches [12] for an aggressive dynamically-scheduled superscalar processor) the energy consumed by the L-CBF is 60% of the energy consumed by the S-CBF for most of the SPEC CPU 2000 benchmarks.