A file I/O system for many-core based clusters

  • Authors:
  • Yuki Matsuo;Taku Shimosawa;Yutaka Ishikawa

  • Affiliations:
  • The University of Tokyo;The University of Tokyo;The University of Tokyo

  • Venue:
  • Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A many-core based co-processor, such as the Intel Many Integrated Core (MIC) Architecture, connected to a server-level multi-core host processor via a PCI Express bus, has recently been the subject of a great deal of attention. In such a machine, because the many-core is separated from the host processor with disk I/O and it also has limited cache and memory bandwidth, performance degradation can results from cache pollution and data transfer latency caused by processing file operations. Three types of file I/O mechanisms for the many-core in such a system are designed, implemented, and evaluated in this paper. One mechanism involves the file I/O system calls being performed by the kernel running on the same core that the application program is running on. Another is a mechanism whereby those system calls are offloaded to the kernel running on a dedicated core of the many-core that handles file I/O operations. In either case, the kernel requests file data transfer to the file system on the host processor and file data is cached on the many-core. The third mechanism involves the system calls being offloaded to the kernel running on the host processor so that the host kernel transfers data directly to the user buffer in the many-core. The experimental results show that the first two mechanisms, performing in the many-core, are superior to offloading them to the host when the data size is relatively small because they are designed to conduct file I/O operations through a file cache and fewer of communications occur between the many-core and the host. With larger data sizes, however, file I/O system calls offloaded to the host, which transfer data directly to/from the user buffer, are better than those performed inside the many-core. In view of cache awareness, it is shown that the user code and part of the file I/O system calls can be performed efficiently when the user buffer data is small enough to be on the cache.