[bitfolk] 2021-05-15 ~17:28 BST (16:28Z) - Unscheduled power…

Top Page
Author: Andy Smith
To: announce
Subject: [bitfolk] 2021-05-15 ~17:28 BST (16:28Z) - Unscheduled power cycle of server "macallan"

Reply to this message
gpg: Signature made Sat May 15 17:00:44 2021 UTC
gpg: using DSA key 0E4236CB52951E14536066222099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]

At approximately 17:28 BST we started receiving numerous alerts for
server "macallan" and customer services on it. Upon investigation I
was unable to connect to the IPMI console of the server.

I got in contact with the colo provider who quickly realised that
they were doing work in that rack and had knocked out the power
cable for this server.

The server started booting around 17:35 and all customer VMs had
booted by 17:47.

We use locking power cables in our servers to try to minimise this
sort of thing, but they only lock at one end - the server end. The
server's power cord had come loose at the other end.

"macallan" is one of our older servers which is single power supply
unit. To mitigate that risk it plugs into a automatic transfer
switch so that its single PSU continues to receive power even if one
of the rack's two PDUs or power feeds fails. Unfortunately that does
not protect it against its single power cord coming out of the ATS.

We have started a hardware refresh and the new spec servers do have
dual PSUs which should help to avoid things like this in future.

Please accept my apologies for this disruption.
Andy Smith
BitFolk Ltd

https://bitfolk.com/ -- No-nonsense VPS hosting
announce mailing list