Xoralgo

Founded in May 2018, Xoralgo builds the next generation RAID (Redundant Array of Independent Disks). While the cornerstone of the current RAID systems is the aging RAID 6, our RAID is based on a new algorithm utilizingfive redundant disks. Our technology is called PentaRAID™.

The problem we address

A famous data loss study at CERN exposed data loss due to reasons other than an apparent disk failure. In particular, errors that RAID 6 cannot correct (dual bit errors). The CERN study have been confirmed by others. Before PentaRAID™, this and other problem was too costly to address. Now there is a solution.

The main limitation of RAID 6 is that it uses only two redundant disks. This makes it vulnerable to data loss. One cause of data loss is a catastrophic disk failure. Another cause of data loss gaining importance today is silent data corruption due to undetected disks errors (UDE). UDE are a product of laws of physics and do occur during "normal" disk operation. In the era of huge disk drives, a UDE may occur every few hours. The IT departments often refer to UDE as " bit flip " as indeed a UDE may occur as a result of repeatedly reading a bit of data from a disk. Every 1014 reads or so, the bit will be read incorrectly, and this error may propagate in calculations (such as calculating the total of a giant spreadsheet), rendering the entire calculation invalid.

RAID 6
A common RAID 6 configuration. Source: en:User:Cburnett, RAID 6, Converted to Inkscape 0.92, CC BY-SA 3.0.
PentaRAID
A PentaRAID™ configuration with 15 disks. Current implementation limit: 259 disks.

The inadequacy of RAID 6 is well-known. The failure of one disk requires its replacement and reconstruction of the data. This process may take days or weeks for the typical multi-terabyte disks of today. The RAID operating with failed disks is said to be in degraded mode. Thus, RAID systems of today spend a significant amount of time operating in degraded mode. RAID 6 in this mode can detect UDE, but cannot correct them, thus resulting in data loss.

It is not unlikely that a second disk fails during the recovery of the first failed disk. This results in doubly degraded mode. While no data loss is imminent for a RAID 6 system operating in doubly degraded mode, in this mode RAID 6 is not capable of detecting or correcting any errors. Thus, any UDE remains undetected and uncorrected by a RAID 6 operating in doubly degrated mode.

How do we solve the problem?

We scale up the RAID 6 architecture to use five parity blocks per stripe (P1-P5). This results in dramatic improvement of error protection (estimated 20 orders of magnitude over RAID 6). PentaRAID™ is a flat RAID, i.e. no need to nest one RAID within another, which avoids many complexities of, say, RAID 60. The IT personnel familiar with RAID 6 will easily adopt to the new technology.

What is the "secret sauce"?

Well, it is not really secret, as the theory of PentaRAID™ is published in our white paper, and in the patent application soon to be published by USPTO. But, in short, we invented a new error correcting code which is used to compute the five parity blocks (P1-P5), and which has an efficient decoder. Therefore, PentaRAID™ has comparable efficiency to RAID 6, with much better data loss protection, including UDE. Also, our code is systematic, which means that non-parity blocks contain original user data, unlike original Reed-Solomon codes.

Proof-of-concept: Ubuntu virtual machine

A PentaRAID™ implementation has been built to to demonstrate viability and usefulness of the technology. As an example, we built a virtual machine (utilizing Oracle's VirtualBox) with 15 virtual disks (represented by regular files) and installed the Ubuntu operating system utilizing PentaRAID™. Ubuntu partitioned the PentaRAID™ storage as usual (using the ext4 filesystem). In one of the experiments, we benchmarked the swap partition of Ubuntu, observing performance slightly under 2GB/sec. This experiment was performed on a $1,600 laptop with a quadcore processor using a single SSD disk (mSATA). The take-away is:

  • Complete operating systems can run on our RAID, utilizing native file systems (unlike NFS).
  • PentaRAID™ provides extreme performance (and superb error protection) even on modestly priced hardware.
We subjected PentaRAID™ to various tests, including simultaneous removal of up to 4 (or even 5) disks. We also tested for loss of data from two simultaneous random errors on randomly selected disks (errors called Undetected Disk Errors, as most disks and RAID systems are incapable of detecting them). PentaRAID™ guarantees recovery from these types of errors.

Ubuntu on PentaRAID
Ubuntu on PentaRAID™. Benchmarking of the swap partition.