Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Studying Balanced Allocations with Differential Equations
Combinatorics, Probability and Computing
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
In-System FPGA Prototyping of an Itanium Microarchitecture
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Fast hash table lookup using extended bloom filter: an aid to network processing
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
SPEC CPU2006 benchmark descriptions
ACM SIGARCH Computer Architecture News
A practical FPGA-based framework for novel CMP research
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
A Desktop Computer with a Reconfigurable Pentium®
ACM Transactions on Reconfigurable Technology and Systems (TRETS) - Special edition on the 15th international symposium on FPGAs
Implementing an OpenFlow switch on the NetFPGA platform
Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The power of one move: hashing schemes for hardware
IEEE/ACM Transactions on Networking (TON)
OCTAVO: an FPGA-centric processor family
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Flexible register management using reference counting
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Exploration and Customization of FPGA-Based Soft Processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A verified information-flow architecture
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Hi-index | 0.00 |
Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today's mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this paper, we develop a new, FPGA-friendly, memory architecture based on a multiple hash scheme that is able to achieve near-associative performance (less than 5% of evictions due to conflicts) without the area overheads of a fully associative memory on FPGAs. Using the proposed architecture as a 64KB L1 data cache, we show that it is able to achieve near-associative miss-rates while consuming 6-7× less FPGA memory resources for a set of benchmark programs from the SPEC2006 suite than fully associative memories generated by the Xilinx Coregen tool. Benefits increase with match width, allowing area reduction up to 100×. At the same time, the new architecture has lower latency than the fully associative memory -- 3.7 ns for a 1024-entry flat version or 6.1 ns for an area-efficient version compared to 8.8 ns for a fully associative memory for a 64b key.