[bitfolk] 2020-12-27 ~00:00 – ~00:45 internal packet loss an…

Top Page
Author: Andy Smith
To: announce
Subject: [bitfolk] 2020-12-27 ~00:00 – ~00:45 internal packet loss and alerts regarding "clockwork" and "limoncello"

Reply to this message
gpg: Signature made Sun Dec 27 01:44:50 2020 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]

As of about 0000Z we started receiving alerts of packet loss and
began investigation. It was found to be an internal issue between hosts
"clockwork" and "limoncello" only. That is, everything on both hosts
was reachable from outside our network and also from inside as long
as it wasn't between those two hosts.

As there is a monitoring node on "limoncello", a number of alerts
were sent out regarding customer services on "clockwork" that it
considered to be down, but they weren't actually down - unless you
happened to be hosted on "limoncello", anyway, and vice versa.

I tracked the issue to one of the two bonded switch ports for
"clockwork"; bringing that interface down and up again appears to
have cleared it. That happened at about 0045Z.

If the problem reoccurs we can down the interface and have it run on
one interface until the port or switch can be changed. If the
problem is actually in the NIC of the server itself things will be
more tricky, but we'll cross that bridge if we come to it.


https://bitfolk.com/ -- No-nonsense VPS hosting
announce mailing list