THIS IS THE ARCHIVED SSRC SITE.
Maintained by Ethan L. Miller.
The current CRSS site is at https://www.crss.us/.

Disaster Recovery Codes: Increasing Reliability with Large-Stripe Error Correction Codes

Appeared in Proceedings of the 3rd International Workshop on Storage Security and Survivability (StorageSS 2007). StorageSS 2007 was held in conjunction with the 14th ACM Conference on Computer and Communications Security (CCS 2007).

Abstract

Large-scale storage systems need to provide the right amount of redundancy in their storage scheme to protect client data. In particular, many high-performance systems require data protection that imposes minimal impact on performance; thus, such systems use mirroring to guard against data loss. Unfortunately, as the number of copies increases, mirroring becomes costly and contributes relatively little to the overall system reliability. Compared to mirroring, parity-based schemes are space-efficient, but incur greater update and degraded-mode read costs. An ideal data protection scheme should perform similarly to mirroring, while providing the space efficiency of a parity-based erasure code.

Our goal is to increase the reliability of systems that currently mirror data for protection without impacting performance or space overhead. To this end, we propose the use of large parity codes across two-way mirrored reliability groups. The secondary reliability groups are defined across an arbitrarily large set of mirrored groups, necessitating a small amount of non-volatile RAM for parity. Since each parity element is stored in non-volatile RAM, our scheme drastically increases the mean time to data loss without impacting overall system performance.

Publication date:
October 2007

Authors:
Kevin Greenan
Ethan L. Miller
Thomas Schwarz
Darrell D. E. Long

Projects:
Archival Storage
Reliable Storage
Ultra-Large Scale Storage

Available media

Full paper text: PDF

Bibtex entry

@inproceedings{greenan-storagess07,
  author       = {Kevin Greenan and Ethan L. Miller and Thomas Schwarz and Darrell D. E. Long},
  title        = {Disaster Recovery Codes: Increasing Reliability with Large-Stripe Error Correction Codes},
  booktitle    = {Proceedings of the 3rd International Workshop on Storage Security and Survivability (StorageSS 2007)},
  month        = oct,
  year         = {2007},
}
Last modified 5 Aug 2020