Pantheon: exascale file system search for scientific computing

Authors:
Joseph L. Naps;Mohamed F. Mokbel;David H. C. Du
Affiliations:
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MNDepartment of Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN;Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN
Venue:
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Year:
2011

Citing 11
Cited 3

Andrew: a distributed personal computing environment

Communications of the ACM - The MIT Press scientific computation series
The Sprite Network Operating System

Computer
A large-scale study of file-system contents

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multidimensional binary search trees used for associative searching

Communications of the ACM
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Clustering Techniques for Minimizing External Path Length

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic Metadata Management for Petabyte-Scale File Systems

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Ceph: a scalable, high-performance distributed file system

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
A five-year study of file-system metadata

ACM Transactions on Storage (TOS)
Measurement and analysis of large-scale network file system workloads

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Spyglass: fast, scalable metadata search for large-scale storage systems

FAST '09 Proccedings of the 7th conference on File and storage technologies

Toward efficient search for ultrascale storage systems

Proceedings of the first annual workshop on High performance computing meets databases
Scientific data services: a high-performance I/O system with array semantics

Proceedings of the first annual workshop on High performance computing meets databases
Examining extended and scientific metadata for scalable index designs

Proceedings of the 6th International Systems and Storage Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide more meaningful organization. In order to search this metadata, it is often stored on separate metadata servers. This solution has drawbacks though due to the multi-tiered architecture of many large scale storage solutions. As data is moved between various tiers of storage and/or modified, the overhead incurred for maintaining consistency between these tiers and the metadata server becomes very large. As scientific systems continue to push towards exascale, this problem will become more pronounced. A simpler option is to bypass the overhead of the metadata server and use the metadata storage inherent to the file system. This approach currently has few tools to perform operations at a large scale though. This paper introduces the prototype for Pantheon, a file system search tool designed to use the metadata storage within the file system itself, bypassing the overhead from metadata servers. Pantheon is also designed with the scientific community's push towards exascale computing in mind. Pantheon combines hierarchical partitioning, query optimization, and indexing to perform efficient metadata searches over large scale file systems.