Disaster Recovery Codes : Increasing Reliability with Large-stripe Erasure Correcting Codes
Large-scale storage systems need to provide the right amount of
redundancy in their storage scheme to protect client data. In
particular, many high-performance systems require data protection
that imposes minimal impact on performance; thus, such systems use
mirroring to guard against data loss. Unfortunately, as the number
of copies increases, mirroring becomes costly and contributes
relatively little to the overall system reliability. Compared to
mirroring, parity-based schemes are space-efficient, but incur
greater update and degraded-mode read costs. An ideal data
protection scheme should perform similarly to mirroring, while
providing the space efficiency of a parity-based erasure code.
Our goal is to increase the reliability of systems that currently mirror data for protection without impacting performance or space overhead. To this end, we propose the use of large parity codes across two-way mirrored reliability groups. The secondary reliability groups are defined across an arbitrarily large set of mirrored groups, necessitating a small amount of non-volatile RAM for parity. Since each parity element is stored in non-volatile RAM, our scheme drastically increases the mean time to data loss without impacting overall system performance.
When:
Wednesday, October 24, 2007 at 12:30 PM
Where:
E2-599
SSRC Contact:
Greenan, Kevin
Last modified 24 May 2019