THIS IS THE ARCHIVED SSRC SITE.
Maintained by Ethan L. Miller.
The current CRSS site is at https://www.crss.us/.

Archival Storage

We have several active and past projects in archival storage, all of which are contributing to the ability to build more efficient, reliable, and secure long-term storage systems.

  • Economic Modeling of Long-Term Storage: Understanding the economics of long-term data preservation is necessary because of the rapid growth of data over time, the prospect of declining storage density growth for some technologies, uncertainty in financial investment market conditions, and an increasing need for long-term data preservation. Current business models rely on continuous storage density growth and its corollary decline in cost-per-byte of storage. Archival storage systems typically preserve data over a long period of time or indefinitely.  Long-term data preservation implies the need for a high degree of confidence that data will not be lost over the lifetime of the storage system; however, different storage technologies, each with its own capacity, performance, reliability, and cost characteristics, presents features and prospects for development in the future that may improve or degrade its competitive value for long-term and highly reliable archival storage relative to other storage technologies.  Additionally, the development of next-generation storage technologies like data storage in glass and synthetic DNA may revolutionalize the archival storage market by providing highly-reliable, scalable, dense, and cost-effective storage.  We are evaluating and comparing the conditions under which each storage technology gains a competitive advantage over other storage technologies in terms of cost to meet a given standard of reliability.  We are also examining the effects of novel storage devices such as QLC flash and hard disk drives with separable platters on the cost and reliability of long-term archival storage.  Finally, we are comparing existing and prospective storage technologies on the basis of their costs while achieving a variety of different reliability standards under a variety of assumptions about how each technology will develop over time.
  • Secure and Searchable Long-Term Storage: As humanity generates ever-increasing amounts of data that must be stored for decades, we must both protect the data from disclosure and allow users to find information. Since long-term storage can potentially suffer from compromised by a single site or person, we distribute data across multiple archive sites, using techniques derived from POTSHARDS. We are investigating techniques that can then allow this data to be searched without revealing search terms or even significant correlation between documents to archive managers, providing a level of privacy necessary for long-term storage of medical records, sensitive corporate and government data, and personal information such as video and photos.

Status

  • Economic Modeling of Long-Term Storage: We answered these questions in our recent work: 1) How much more cost-effective is tape than other storage technologies for archival stoarge? 2) Does hard disk drive reliability significantly affect its total cost of ownership within archival systems? 3) To what extent over the long term does the fast pace of development for solid state disks mitigate its higher cost per byte of storage? 4) What are the economic impacts of technology choices, over time, on long-term preservation?  We are working on other questions relating to archival system reliability, including 1) What are the chances of data loss using various storage technologies and storage system designs? 2) Does each storage technology present a "sweet spot" of minimal cost relative to other technologies for any particular standard of reliability? 3) How do different assumptions about rates of development affect the total cost of ownership as we vary the amount of reliability needed within a storage system? 4) How will prospective storage technologies compare with existing storage technologies as the demand for both capacity and performance increase, and what are the effects of altering the growth rates of capacity, performance, and reliability for existing technologies?
  • Secure and Searchable Long-Term Storage: We are working towards publishing our initial work on Percival: a framework that leverages pre-indexing, keyed hashing and Bloom filters to enable blinded searching, blinding the archive from knowing what terms are being queried.
  • Past Projects: The following are projects we have worked on in the past
    Logan: A management system to scalably grow, maintain, and evolve a heterogeneous archival storage system
    Computation-Storage Trade-off: Using provenance to reduce storage overhead by storing intermediate and initial inputs and recomputing a dataset on demand
    Pergamum: long-term evolvable storage built from intelligent network-attached bricks with both disk and NVRAM such as flash.
    Deep Store: building more efficient archival storage using deduplication to take advantage of intra-file and inter-file redundancy.
    POTSHARDS: long-term secure storage, which allows the secure preservation of data for decades without relying upon traditional encryption to prevent information leakage.
    Improving Trace Analysis: analyzing long-term traces that highlighted shortcoming in current tracing and analysis techniques and designing new techniques to improve future traces and analyses
    Archival Workload Studies: produced several studies of archival storage user behavior and system evolution and provided relevant, up-to-date observations on archival system usage patterns to guide and validate future archival storage designs

Publications

Last modified 27 Jan 2020