Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sorting and Searching Using Ternary CAMs
HOTI '02 Proceedings of the 10th Symposium on High Performance Interconnects HOT Interconnects
ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Algorithms for advanced packet classification with ternary CAMs
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
New Generation of Predictive Technology Model for Sub-45nm Design Exploration
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Similarity search and locality sensitive hashing using ternary content addressable memories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Small subset queries and bloom filters using ternary associative memories, with applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Design of spin-torque transfer magnetoresistive RAM and CAM/TCAM with high sensing and search speed
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A resistive TCAM accelerator for data-intensive computing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A content-addressable memory architecture for image coding usingvector quantization
IEEE Transactions on Signal Processing
On using the CAM concept for parametric curve extraction
IEEE Transactions on Image Processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
With technology scaling, on-chip power dissipation and off-chip memory bandwidth have become significant performance bottlenecks in virtually all computer systems, from mobile devices to supercomputers. An effective way of improving performance in the face of bandwidth and power limitations is to rely on associative memory systems. Recent work on a PCM-based, associative TCAM accelerator shows that associative search capability can reduce both off-chip bandwidth demand and overall system energy. Unfortunately, previously proposed resistive TCAM accelerators have limited flexibility: only a restricted (albeit important) class of applications can benefit from a TCAM accelerator, and the implementation is confined to resistive memory technologies with a high dynamic range (RHigh/RLow), such as PCM. This work proposes AC-DIMM, a flexible, high-performance associative compute engine built on a DDR3-compatible memory module. AC-DIMM addresses the limited flexibility of previous resistive TCAM accelerators by combining two powerful capabilities---associative search and processing in memory. Generality is improved by augmenting a TCAM system with a set of integrated, user programmable microcontrollers that operate directly on search results, and by architecting the system such that key-value pairs can be co-located in the same TCAM row. A new, bit-serial TCAM array is proposed, which enables the system to be implemented using STT-MRAM. AC-DIMM achieves a 4.2X speedup and a 6.5X energy reduction over a conventional RAM-based system on a set of 13 evaluated applications.