[bitfolk] Interesting (?) Linux RAID-10 performance caveat

Top Page
Author: Andy Smith
Date:  
To: users
Subject: [bitfolk] Interesting (?) Linux RAID-10 performance caveat

Reply to this message
gpg: Signature made Thu May 30 02:11:40 2019 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hello,

A new BitFolk server that I will put into service soon has 1x SSD
and 1x NVMe instead of 2x SSD. I tried this because the NVMe,
despite being vastly more performant than the SATA SSD, is actually
a fair bit cheaper. On the downside it only has a 3 year warranty
(vs 5) and 26% of the write endurance (5466TBW vs 21024TBW)¹.

So anyway, a pair of very imbalanced devices. I decided to take some
time to play around with RAID configurations to see how Linux MD
handled that. The results surprised me, and I still have many open
questions.

As a background, for a long time it's generally been advised that
Linux RAID-10 gives the highest random IO performance. This is
because it can stripe read IO across multiple devices, whereas with
RAID-1, a single process will do IO to a single device.

Linux's non-standard implementation of the RAID-10 algorithm can
also generalise to any amount of devices: conventional RAID-10
requires an even number of devices with a minimum of 4, but Linux
RAID-10 can work with 2 or even an odd number.

More info about that:
https://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

As a result I have rarely felt the need to use RAID-1 for 10+ years.

But, I ran these benchmarks and what I found is that RAID-1 is THREE
TIMES FASTER than RAID-10 on a random read workload with these
imbalanced devices.

Here is a full write up:
http://strugglers.net/~andy/blog/2019/05/29/linux-raid-10-may-not-always-be-the-best-performer-but-i-dont-know-why/

I can see and replicate the results, and I can tell that it's
because RAID-1 is able to direct the vast majority of reads to the
NVMe, but I don't know why that is or if it is by design.

I also have some other open questions, for example one of my tests
against HDD is clearly wrong as it achieves 256 IOPS, which is
impossible for a 5,400RPM rotational drive.

So if you have any comments, explanations, ideas how my testing
methodology might be wrong, I would be interested in hearing.

Cheers,
Andy

¹ I do however monitor the write capacity of BitFolk's SSDs and they
all show 100+ years of expected life, so I am not really bothered
if that drops to 25 years.

--
https://bitfolk.com/ -- No-nonsense VPS hosting