Computational RAM: Implementing Processors in Memory
IEEE Design & Test
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
FlashDB: dynamic self-tuning database for NAND flash
Proceedings of the 6th international conference on Information processing in sensor networks
Query processing techniques for solid state drives
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Differential RAID: rethinking RAID for SSD reliability
Proceedings of the 5th European conference on Computer systems
VL2: a scalable and flexible data center network
Communications of the ACM
Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition
Journal of Signal Processing Systems
Augmenting data center networks with multi-gigabit wireless links
Proceedings of the ACM SIGCOMM 2011 conference
Leveraging latency-insensitivity to ease multiple FPGA design
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
CORFU: a shared log design for flash clusters
NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
QuickSAN: a storage area network for fast, distributed, solid state disks
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
For many "Big Data" applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this paper we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a high-performance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70mus, including 3.5mus of network overhead.