[bitfolk] Suspend/restore causes clock oddities (Was Re: Mai…

Αρχική Σελίδα
Συντάκτης: Andy Smith
Ημερομηνία:  
Προς: users
Παλιά Θέματα: [bitfolk] Maintenance plans for Friday 23rd night / Saturday 24th morning
Αντικείμενο: [bitfolk] Suspend/restore causes clock oddities (Was Re: Maintenance plans for Friday 23rd night / Saturday 24th morning)

Reply to this message
gpg: Signature made Sat Jul 24 10:03:30 2010 UTC using DSA key ID BF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>"
gpg: aka "Andrew James Smith <andy@strugglers.net>"
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>"
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>"
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>"
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>"
Hello,

On Wed, Jul 21, 2010 at 08:06:10PM +0000, Andy Smith wrote:
> Hopefully you recall the scheduled maintenance coming up on Friday
> night / Saturday morning for faustino:
>
> http://lists.bitfolk.com/lurker/message/20100614.062104.ec347682.en.html
>
> I'm going to have to bring it forward by about 15 minutes, so at
> approximately 2245Z (quarter to midnight UK time) all VPSes on
> faustino will be shut down. The machine will then be moved to a
> different suite and powered up again. This should take less than
> half an hour.


Those watching the tweets will have seen all of this already, but
here's the summary.

The faustino maintenance was necessary because Telehouse want the
remaining rack tenants in TFM1 to move to new racks so they can
finish refitting the suite. This was planned months ago.

On the night, it turned out that Telehouse had managed to allocate a
new rack location that didn't actually exist. Everyone thought it
did exist until the point where the floor tiles were to be lifted
and power switched on, at which point there wasn't anything to turn
on.

There was no possibility of rescheduling, and given the late hour it
took a long time to get to the point where the move could be made.
This delayed the start of the maintenance from 2245Z to around 0119Z
and made it slightly longer than it was intended to be
(approximately 40 minutes, instead of the announced maximum of 30
minutes).

> After that, from 0000Z (1am UK time) I shall be shutting down
> urquell to put some more RAM in it. This should also take less than
> 30 minutes.


This went according to plan. urquell was down between about 0009Z
and about 0027Z.

In both the faustino and urquell cases this morning I suspended the
VMs to disk and restored them afterwards. This appears to have
caused clocks to go a bit crazy, since they saw no ticks while they
were suspended. On restore, ntpd will have noted the massive skew
and killed itself, like this:

Jul 24 00:22:16 spamd3 ntpd[13408]: synchronized to 209.237.247.192, stratum 3
Jul 24 00:22:16 spamd3 ntpd[13408]: time correction of 1365 seconds exceeds sanity limit (1000); set clock manually to the correct UTC time.

Those with VPSes on faustino or urquell may wish to check that their
time is correct and that ntpd is running.

In future would you prefer if we never try to do this
suspend/restore thing and instead just do a normal shutdown/boot?

By the way, if you ask for it we can monitor your ntpd at no charge,
which would alert you if it wasn't running or if it didn't have
sync. Email to support@??? if you want that.

Cheers,
Andy

-- 
http://bitfolk.com/ -- No-nonsense VPS hosting