Re: [bitfolk] Scheduled maintenance on 24–26 July

Top Page
Author: Andy Smith
Date:  
To: announce
New-Topics: [bitfolk] Save/restore failures (was Re: Scheduled maintenance on 24–26 July)
Subject: Re: [bitfolk] Scheduled maintenance on 24–26 July

Reply to this message
gpg: Signature made Sun Jul 24 02:28:06 2016 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hello,

The work scheduled for 2016-07-24 has now been completed. Further
work on the remaining hosts is scheduled for the next two days.

Unless your VPS is on host "hen" there were no unexpected issues and
you can stop reading now.

A mistake was made when working on host "hen": instead of shutting
down customer VPSes prior to rebooting the host, I accidentally
saved them to disk.

Once I'd realised what I'd done I decided it would be best to let
them restore from disk on boot as otherwise they would all
experience something similar to a power fail and have to
replay/repair their filesystems on boot.

In theory save/restore is a great idea (it's similar to hibernation)
as it results in less disruption, but unfortunately not all Linux
kernels cope well with it, which is why we don't do it. I normally
prefer to give you the greater certainty of a clean shut down and
boot.

So, if your VPS is on "hen" and…

- You're wondering why it still shows the same uptime as before…

…it's because it was saved/restored rather than rebooted.

- It's broken in some way…

…it's unfortunately very likely because it didn't handle
save/restore well. A clean reboot will probably be necessary.
Sorry about this. :(

I went through every VPS on "hen" that was showing problems in
Nagios and I found only two that didn't handle the save/restore. One
was running Debian 6.0 and another was running Ubuntu 16.04. I
rebooted these and Nagios is now happy about them.

This does not mean that every Debian 6.0 or Ubuntu 16.04 VPS will
fail to save/restore. Also very few customers have Nagios alerting
so there may be other VPSes which didn't like it but I am unaware of
them.

Cheers,
Andy

--
http://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce@???
https://lists.bitfolk.com/mailman/listinfo/announce