An Efficient SSD-based Hybrid Storage Architecture for Large-Scale Search Engines

  • Authors:
  • Ruixuan Li;Chengzhou Li;Weijun Xiao;Hai Jin;Heng He;Xiwu Gu;Kunmei Wen;Zhiyong Xu

  • Affiliations:
  • -;-;-;-;-;-;-;-

  • Venue:
  • ICPP '12 Proceedings of the 2012 41st International Conference on Parallel Processing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale search engines use hard disk drives (HDD) to store the mass index data for their capacity, whose performances are limited by the relatively low I/O performance of HDD. Caching is an effective optimization, and many caching algorithms have been proposed to improve retrieval performance. Considering the high cost of memory and huge amounts of data, the limited capacity of cache in memory cannot resolve the above problem thoroughly. In this paper, we adopt a solid state disk (SSD) based storage architecture, which uses SSD as a secondary cache for memory. We analyze the I/O patterns of search engines and propose SSD-based data management policies based on the hybrid storage architecture, including data selection, data placement and data replacement. Our main goal is to improve the performance of search engines while reducing operation cost inside SSD. The experimental results demonstrate the proposed architecture improves the hit ratio by 13.31%, the performance by 41.05%, the average access time inside SSD by 43.83%, and reduces block erasure operations by 71.52%.