[bitfolk] Host "hen" crashed again (Was: Re: Host "hen" une…

Top Page
Author: Andy Smith
Date:  
To: announce
Old-Topics: [bitfolk] Host "hen" unexpectedly rebooted 2018-11-26 22:24
Subject: [bitfolk] Host "hen" crashed again (Was: Re: Host "hen" unexpectedly rebooted 2018-11-26 22:24)

Reply to this message
gpg: Signature made Mon Dec 10 15:25:14 2018 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hi,

On Mon, Nov 26, 2018 at 10:41:43PM +0000, Andy Smith wrote:
> At approximately 22:24Z, host "hen" rebooted itself unexpectedly.


This unfortunately has happened again today, at about 14:23Z.

This time I was logging the serial console to a file and so am able
to see that there was the equivalent of a kernel panic in the
hypervisor.

That is, I do not believe that hen's hardware is at fault. I think
it's tripping against a bug in Xen, and it's happened to the same
host twice because it's been triggered by the same guest doing
something (I do not believe malicious at this stage).

I've not got a quick fix to this because moving all customers on hen
to new hardware is likely just going to crash the hypervisor on the
other hardware. I need to discuss the problem with the Xen
developers and see if I get anywhere.

In between last time and this I also built a new version of the
hypervisor and set every host to boot into it, so hen is now
actually running a very slightly newer version than everything else
(and also compared to what it was running before). This possibly
could help, just by chance, though as far as I am aware it is not a
known bug.

So I am very sorry but I am going to have to ask you to bear with me
for a little while, while I investigate this more. Until I can
establish which guest triggered it I can't move any of the customers
on host hen to other hosts because that possibly just triggers it
elsewhere. And it could still elsewhere anyway.

If I don't make headway with this then I can revert to earlier
versions that we've been stable on for a long time, but security
issues have been fixed since then so I'm not going to do that except
as a last resort.

I will provide more information as soon as I can.

Thanks,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce@???
https://lists.bitfolk.com/mailman/listinfo/announce