Analysis and Workload Characterization of the CERN EOS Storage System

Appeared in Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems (CHEOPS '22).

Abstract

Modern, large-scale scientific computing runs on complex exascale storage systems that support complex data workloads. Understanding the data access and movement patterns is vital for informing the design of future iterations of existing systems and next-generation systems. Yet we are lacking in publicly available traces and tools to help us understand even one system in depth, let alone correlate long-term cross-system trends.

In this work, we investigate the workload characteristics of the CERN EOS filesystem, analyzing over 2.49 billion events containing over 300 PB in reads and 150 PB in writes across 11 months. We contrast our finding with analyses from other scientific storage systems, allowing us to observe larger trends that appear over the years and revisit and question conventional wisdom such as "write once, read maybe" and the influence of user actions on system-wide data movement. By studying trace capture mechanisms across these systems, we motivate a standardized trace collection and analysis toolset, so that future researchers can more easily study existing systems to aid in system design.

Publication date:
April 2022

Authors:
Devashish Purandare
Daniel Bittman
Ethan L. Miller

Projects:
Archival Storage
Designing systems for QLC flash
Dynamic Non-Hierarchical File Systems

Available media

Full paper text: PDF
Presentation: video

Bibtex entry

@inproceedings{purandare-cheops22,
  author       = {Devashish Purandare and Daniel Bittman and Ethan L. Miller},
  title        = {Analysis and Workload Characterization of the {CERN} {EOS} Storage System},
  booktitle    = {Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems  (CHEOPS '22)},
  month        = apr,
  year         = {2022},
}
Last modified 21 Feb 2023