[bitfolk] Problems with "snaps" between 03:51 and 06:41 toda…

Top Page
Author: Andy Smith
Date:  
To: announce
Subject: [bitfolk] Problems with "snaps" between 03:51 and 06:41 today

Reply to this message
gpg: Signature made Thu Mar 1 07:03:54 2018 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hi,

Around 04:00Z I received alerts that host "snaps" had unexpectedly
rebooted. Upon investigating it had indeed reset itself for reasons
unknown starting at about 03:51Z. It wasn't a full power cycle nor a
graceful shutdown, it just reset itself with no useful log output.

Whilst all VPSes did seem to boot up okay, unfortunately it soon
became clear that "snaps" had booted into an earlier version of the
hypervisor - one without the recent Spectre/Meltdown (and
other) security fixes that were deployed last week.

At this point customer VPSes on "snaps" were operating normally
again but things could not be left in that insecure state, so after
some time spent investigating things, between 06:17Z and 06:37Z I
did a clean shut down and booted into the correct version of the
hypervisor again.

I have since established why the incorrect boot entry was
automatically chosen¹ and have fixed that problem. I have not
worked out what caused "snaps" to reset itself. We have been having
some stability issues with "snaps" over the last 6 months and I
think we are going to have to decommission it.

I will come up with a plan and contact customers on "snaps" directly
later today, but in the mean time if your VPS is on "snaps" and you
wish for it to be moved to another server as a priority please
contact support@??? and we'll get that done. It will involve
shutting your VPS down and booting it a few seconds later on the
target server. None of the details of your VPS will change. Please
indicate what sort of time of day would be best for that to happen.

Apologies for the disruption this will have caused you.

Cheers,
Andy

¹ The newer hypervisor package ships an override to make sure that
the server boots into the hypervisor by default at the next boot.
This is meant to make it easier for people, but all it did was
override my actual intentionally-set default boot option with one
that wasn't suitable. This was not noticed in testing because the
testing machines had no other versions of the hypervisor present.

--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce@???
https://lists.bitfolk.com/mailman/listinfo/announce