Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories
Appeared in ACM Transactions on Storage 8(2).
Abstract
The scope of archival systems is expanding beyond cheap tertiary storage: scientific and medical data is
increasingly digital, and the public has a growing desire to digitally record their personal histories. Driven by the increase in cost efficiency of hard drives, and the rise of the Internet, content archives have become a means of providing the public with fast, cheap access to long-term data. Unfortunately, designers of purpose built archival systems are either forced to rely on workload behavior obtained from a narrow, anachronistic view of archives as simply cheap tertiary storage, or extrapolate from marginally related enterprise workload data and traditional library access patterns.
To close this knowledge gap and provide relevant input for the design of effective long-term data storage
systems, we studied the workload behavior of several systems within this expanded archival storage space.
Our study examined several scientific and historical archives, covering a mixture of purposes, media types,
and access models—that is, public versus private. Our findings show that, for more traditional private
scientific archival storage, files have become larger, but update rates have remained largely unchanged.
However, in the public content archives we observed, we saw behavior that diverges from the traditional
“write-once, read-maybe” behavior of tertiary storage. Our study shows that the majority of such data is
modified—sometimes unnecessarily—relatively frequently, and that indexing services such as Google and
internal data management processes may routinely access large portions of an archive, accounting for most
of the accesses. Based on these observations, we identify areas for improving the efficiency and performance of archival storage systems.
Publication date:
May 2012
Authors:
Ian Adams
Mark W. Storer
Ethan L. Miller
Projects:
Archival Storage
Tracing and Benchmarking
Available media
Full paper text: PDF
Bibtex entry
@article{adams-tos12, author = {Ian Adams and Mark W. Storer and Ethan L. Miller}, title = {Analysis of Workload Behavior in Scientific and Historical Long-Term Data Repositories}, journal = {ACM Transactions on Storage}, volume = {8}, number = {2}, month = may, year = {2012}, }