Performance evaluation of linux file systems for data warehousing workloads

  • Authors:
  • Peter Wai Yee Wong;Ric Hendrickson;Haider Rizvi;Steve Pratt

  • Affiliations:
  • IBM, Austin, TX;IBM, Austin, TX;IBM, Markham, ON, Canada;IBM, Austin, TX

  • Venue:
  • InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many database users store data on raw or block devices for performance reasons, since file caching and file locking by the file system can be bypassed. However, many database users would prefer to use file systems for the ease of long-term maintenance. To our knowledge, there have not been any major efforts to systematically assess the performance of Linux file systems for database workloads. In this paper, we present our initial performance study on data warehousing systems. We first provide a brief introduction to various Linux file systems, namely Ext2, Ext3, ReiserFS, XFS and JFS. We examine the performance impact of asynchronous I/O, direct I/O, file caching, I/O schedulers, file fragmentation, and database storage methods. We then quantify the performance of these Linux file systems utilizing a well-known data warehousing workload. Finally, system configurations are recommended and future work is suggested.