Traffic management: a holistic approach to memory placement on NUMA systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
MemC3: compact and concurrent MemCache with dumber caching and smarter hashing
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Mercury: bringing efficiency to key-value stores
Proceedings of the 6th International Systems and Storage Conference
Thin servers with smart pipes: designing SoC accelerators for memcached
Proceedings of the 40th Annual International Symposium on Computer Architecture
Leveraging hardware message passing for efficient thread synchronization
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
MICA: a holistic approach to fast in-memory key-value storage
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
CPHash is a concurrent hash table for multicore processors. CPHash partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHash's message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threads per core show that CPHash has ~1.6x higher throughput than a hash table implemented using fine-grained locks. An analysis shows that CPHash wins because it experiences fewer cache misses and its cache misses are less expensive, because of less contention for the on-chip interconnect and DRAM. CPServer, a key/value cache server using CPHash, achieves ~5% higher throughput than a key/value cache server that uses a hash table with fine-grained locks, but both achieve better throughput and scalability than memcached. The throughput of CPHash and CPServer also scale near-linearly with the number of cores.