The log-structured merge-tree (LSM-tree)
Acta Informatica
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
A flash-memory based file system
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
A log buffer-based flash translation layer using fully-associative sector translation
ACM Transactions on Embedded Computing Systems (TECS)
LAST: locality-aware sector translation for NAND flash memory-based storage systems
ACM SIGOPS Operating Systems Review
High performance solid state storage under Linux
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
SFS: random write considered harmful in solid state drives
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
A space-efficient flash translation layer for CompactFlash systems
IEEE Transactions on Consumer Electronics
Hi-index | 0.00 |
In the last several years hundreds of thousands of SSDs have been deployed in the data centers of Baidu, China's largest Internet search company. Currently only 40\% or less of the raw bandwidth of the flash memory in the SSDs is delivered by the storage system to the applications. Moreover, because of space over-provisioning in the SSD to accommodate non-sequential or random writes, and additionally, parity coding across flash channels, typically only 50-70\% of the raw capacity of a commodity SSD can be used for user data. Given the large scale of Baidu's data center, making the most effective use of its SSDs is of great importance. Specifically, we seek to maximize both bandwidth and usable capacity. To achieve this goal we propose {\em software-defined flash} (SDF), a hardware/software co-designed storage system to maximally exploit the performance characteristics of flash memory in the context of our workloads. SDF exposes individual flash channels to the host software and eliminates space over-provisioning. The host software, given direct access to the raw flash channels of the SSD, can effectively organize its data and schedule its data access to better realize the SSD's raw performance potential. Currently more than 3000 SDFs have been deployed in Baidu's storage system that supports its web page and image repository services. Our measurements show that SDF can deliver approximately 95% of the raw flash bandwidth and provide 99% of the flash capacity for user data. SDF increases I/O bandwidth by 300\% and reduces per-GB hardware cost by 50% on average compared with the commodity-SSD-based system used at Baidu.