[bitfolk] Ubuntu 10.04 on BitFolk (Was: Re: Support this wee…

Top Page

Reply to this message
Author: Andy Smith
Subject: [bitfolk] Ubuntu 10.04 on BitFolk (Was: Re: Support this weekend /Ubuntu Lucid LTS release)
=A0- If the customer needs to buy SSD-backed storage then they can.<br>
=A0- If the caching is good enough then no one would feel the need to<br>
=A0 =A0buy SSD anyway, so why add complexity?<br>
=A0- If people buy all of the SSD, does that reduce caching benefit<br>
=A0 =A0to zero and suddenly screw everyone else over?<br>
=A0 =A0Presumably SSD-backed storage could be priced such that if a lot<br=

=A0 =A0of people did buy it, it would be economical to go out and buy a<br=

=A0 =A0pair of larger ones and swap them over without downtime[2].<br>
So, if anyone has any thoughts on this I&#39;d be interested in hearing<br>
If you had an IO latency problem, would you know how to diagnose it<br>
to determine that it was something you were doing as opposed to<br>
&quot;BitFolk&#39;s storage is overloaded but it&#39;s not me&quot;?<br>
If you could do that, would you be likely to spend more money on<br>
SSD-backed storage?<br>
If we came to you and said that your VPS service was IO-bound and<br>
would run faster if you bought some SSD-backed storage, do you think<br>
that you would?[3]<br>
My gut feeling at the moment is that while I would love to be<br>
feeding the geek inside everyone and offering eleventy-billion<br>
choices, demand for SSD-backed storage at an additional cost will be<br>
I also think it&#39;s going to be very difficult for an admin of a<br>
virtualised block device to tell the difference between:<br>
=A0 =A0&quot;All my processes are really slow at talking to storage; it&#3=
=A0 =A0because of my process ID 12345 which is a heavy DB query&quot;<br>
=A0 =A0&quot;All my processes are really slow at talking to storage; that&=
=A0 =A0definitely a problem with BitFolk&#39;s storage and not anything I<=
=A0 =A0am doing.&quot;<br>
By the way, I think we&#39;ve done reasonably well at keeping IO latency<br=

down, over the years:<br>
barbar: =A0 =A0<a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_1634_=
6.png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1634_6=
bellini: =A0 <a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_2918_4.=
png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_2918_4.p=
cosmo: =A0 =A0 <a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_2282_=
4.png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_2282_4=
curacao: =A0 <a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_1114_6.=
png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1114_6.p=
dunkel: =A0 =A0<a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_1485_=
6.png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1485_6=
faustino: =A0<a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_1314_6.=
png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1314_6.p=
kahlua: =A0 =A0<a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_1192_=
6.png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1192_6=
kwak: =A0 =A0 =A0<a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_111=
3_6.png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1113=
obstler: =A0 <a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_1115_6.=
png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_1115_6.p=
president: <a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_2639_4.pn=
g" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_2639_4.png=
urquell: =A0 <a href=3D"http://tools.bitfolk.com/cacti/graphs/graph_2013_6.=
png" target=3D"_blank">http://tools.bitfolk.com/cacti/graphs/graph_2013_6.p=
(Play at home quiz: which four of the above do you think have eight<br>
disks instead of four? Which one has four 10kRPM SAS disks? Answers<br>
at [4])<br>
In general we&#39;ve found that keeping the IO latency below 10ms keeps<br>
people happy.<br>
There have been short periods where we&#39;ve failed to keep it below<br>
10ms and I&#39;m sure that many of you can remember times when you&#39;ve<b=
found your VPS sluggish. Conversely I suspect that not many<br>
customers can think of times when their VPSes have been the *cause*<br>
of high IO load, yet high IO load is in general only caused by<br>
customer VMs! So for every time you have experienced this, someone<br>
else was causing it![5]<br>
I think that, being in the business of providing virtual<br>
infrastructure at commodity prices, we can&#39;t really expect too many<br>
people to want or be able to take the time to profile their storage<br>
use and make a call on what needs to be backed by SATA or SSD.<br>
I think we first need to try to make it as good as possible for<br>
everyone, always. There may be a time in the future where it&#39;s<br>
commonplace for customers to evaluate storage in terms of IO<br>
operations per second instead of gigabytes, but I don&#39;t think we are<br=

there yet.<br>
As for the &quot;low-end customers subsidise higher-end customers&quot;<br>
argument, that&#39;s just how shared infrastructure works and is already<br=

the case in many existing metrics, so what&#39;s one more? While we<br>
continue to not have a good way to ration out IO capacity it is<br>
difficult to add it as a line item.<br>
So, at the moment I&#39;m more drawn to the &quot;both&quot; option but wit=
h the<br>
main focus being on caching with a view to making it better for<br>
everyone, and hopefully overall