Re: [bitfolk] Server "elephant" has crashed a few times, ong…

Top Page
Author: Andy Smith
Date:  
To: announce
Subject: Re: [bitfolk] Server "elephant" has crashed a few times, ongoing problems

Reply to this message
gpg: Signature made Fri Oct 23 16:37:48 2020 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hi,

On Fri, Oct 23, 2020 at 11:46:11AM +0000, Andy Smith wrote:
> On Fri, Oct 23, 2020 at 11:19:21AM +0000, Andy Smith wrote:
> > I'm trying to isolate the issue to one particular VM because if a
> > guest can crash the host then it's a bug in the hypervisor and
> > just moving guests around won't solve the problem.
>
> I can't find it. As we have had problems with elephant before I'm
> going to assume hardware problem and start moving customer VMs to
> other hosts.


While moving customer VMs to other hosts, booting one of them caused
server "macallan" to crash in exactly the same way. So, I am ruling
out hardware issues with "elephant".

By preventing this particular VM from booting I was able to boot all
of the other VMs on "macallan". I have some hope that it is just
this one VM that is tickling a particularly nasty bug.

I am going to try now starting the remainder of VMs on "elephant".

If that is successful I will then take the suspect VM to test
hardware to see if I can further reproduce.

I am confused because I am sure I tried reverting last weekend's
hypervisor upgrade to the previous version while investigating
matters on "elephant", yet it still crashed. Possibly I made a
mistake (e.g. booted with wrong hypervisor).

Also, everything obviously booted up fine last weekend when I did
the maintenance so possibly this customer has found a new and
unrelated bug.

The best case at this point is that I can reproduce the problem with
just that one VM, report it, get it fixed and then have to reboot
everything to deploy the fix.

Cheers,
Andy

--
https://bitfolk.com/ -- No-nonsense VPS hosting
_______________________________________________
announce mailing list
announce@???
https://lists.bitfolk.com/mailman/listinfo/announce