Using Disk Add-Ons to Withstand Simultaneous Disk Failures with Fewer Replicas

Contemporary storage systems that utilize replication often maintain more than two replicas of each data item, reducing the risk of permanent data loss due to simultaneous disk failures. The price of the additional copies is smaller usable storage space, increased network traffic, and higher power consumption. We propose to alleviate this problem with SIMFAIL, a storage system that maintains only two replicas and utilizes per-disk “add-ons”, which are simple hardware devices equipped with relatively small memory that proxy disk I/O traffic. SIMFAIL can significantly reduce the risk of data loss due to temporally adjacent disk failures by quickly copying at risk data from disks to their add-ons. SIMFAIL can further eliminate the risk entirely by maintaining local parity information of disks on their add-ons (such that each add-on holds the parity of its own disk’s data chunks). We postulate that SIMFAIL may open the door for cloud providers to reduce the number of data replicas they use from three to two.