Re: [bitfolk] VPS failover techniques

Author: admins
Date:
To: users
Subject: Re: [bitfolk] VPS failover techniques

In a previous life we ran a pair of HA load ballancers, serial connected
heartbeat and with STONITH, in a Primary Backup config, as a front end
to a whole bunch of ISP type services.

Then had these point to the services (multiples thereof). The published
IP/s on the load ballancers were virtual or floating and moved between
the two.

It gave us a lot of flexibility to adjust the load balancing parameters
sometimes pressing the same in to service for fail-over of a back-end
service, or dropping back end services out of the list for maintenance
or reloads. When you do this though it kills any TCP sessions and it is
up to the client to re-establish these but the load ballancers just
point them at a different service when the re-connect happened. The
state was lost though and each re-connect was a new application session.
For web servers or squid proxys this does not matter much though

Basically if the backend services were load balanced of that front end
then when one failed at least some service was maintained via what
remained and we could drop the failed one out of the load balancing list
once we spotted it (Nagios monitoring). There is no reason why this
could not be scripted. Monit as such was not available to us at that
point in time.

This might be a useful as a paid for service, to offer as an ISP offered
service, but is overkill for a single request from a customer.

Alternatively if virtual/floating IP's were available there is no reason
that you could not run a similar setup on your pair of VPS with STONITH
running on the same pair that ran the services, a direct heartbeat
interconnect on a Private-LAN/VLAN or some such if a dedicated heartbeat
serial link was not available and run them as a primary/backup pair.

But as suggested earlier your provider would need to be offering those
extra things (heartbeat link, and virtual/floating IP).

You would of course need to keep the contents of the servers
sufficiently synced in readiness for fail-over taking place, and it all
would have to reside at the same provider.

If you tried anything like this across different providers (unlikely to
be possible with floating/virtual IP's) there is a very real risk that
network congestion could mess with your heartbeat link with unforeseen
results and loss of service.

Heartbeat and STONITH are part (or were last I looked) of the linked
Linux HA bunch. Last I looked at these though it was a long while ago. YMMV

Cheers

Kirbs

On 21/02/2019 14:08, Andy Smith wrote:
> Hi Chris,
>
> On Thu, Feb 21, 2019 at 01:19:14PM +0000, Chris Smith via users wrote:
>> I’m exploring the idea of using two VPSs on different hosts to
>> implement some sort of failover mechanism. Is anyone here doing
>> something similar, or have any recommendations?
> I do it myself but I'm not aware of any customers doing it.
>
> All solutions in this space are going to require paying for multiple
> VPSes, and I guess that is the major turn-off for people.
>
> As a customer you cannot yet by yourself programmatically float an IP
> address between two different VPSes, but it's what I do with the auid of
> a script. I have asked in the past if any customers wanted to explore
> that, in which case I would be able to turn that into a service.
> Probably for free given that you need to pay for an extra VPS and at
> least one extra IP.
>
> A lot of this depends on what the very vague and high level term
> "failover" means to you.
>
> As one example architecture, I have two VMs each of which runs haproxy.
> The haproxy fronts various different TCP services such as some web
> sites, spamd, entropy service etc. There are multiple backend VMs
> running each service.
>
> Clients talk to the haproxy IP. The haproxy health-checks backends and
> decides where to proxy the client connection to.
>
> There is also a keepalived on each haproxy host which in the event of
> the live haproxy host becoming unavailable moves the floating IPs to the
> other haproxy host. By this means it is possible for me to take some of
> the backend VMs out of service without clients noticing (connected
> clients will reconnect, however).
>
> So, downsides here:
>
> - Added complexity, although once you understand them haproxy and
> keepalived are pretty simple sturdy pieces of software
>
> - Have to pay for an extra VM sitting around doing nothing until it's
> needed. How much is the continuity worth it to you though? I mean,
> minimal BitFolk VM, £6.49+VAT/mo., arguably in many contects I could
> charge more than that for writing this email… 😀
>
> - It's all at BitFolk. You survive death of a BitFolk host, but a lot of
> disruptions affect entire colo provider / site.
>
> Maybe that's excessive work / expenditure for the level of resilience
> you desire though. An essential first step is deciding what it is you
> want to achieve.
>
> The main reason I put floating IPs in front of customer-visible services
> is because if I don't then customers will see errors and problems when I
> do maintenance work either on the services themselves or when I reboot a
> whole BitFolk server. In truth I think that a half hour unavailability
> of spamd, apt-cacher, entropy etc is bearable but I know I will get
> complaints and queries about it and so they have floating IPs just so I
> don't have to deal with that.
>
> BitFolk's resolvers are a different matter. They can't be unavailable
> for half an hour, or even minutes really. At the moment there's four of
> them and they live behind a Pacemaker cluster that always ensures that
> two of them are available, again by use of floating IPs. This is very
> complex and in hindsight I wish I had not done this. keepalived probably
> would have sufficed. This cluster needs replacing due to its constiuent
> VMs being obsolete OS versions and its next incarnation most likely will
> not be a full Pacemaker cluster.
>
> Maybe you are only trying to beat the catastrophic failure of a piece of
> BitFolk's hardware.
>
> In such a case, we try to limit the downtime to a handful of hours. We
> have spare hardware, we hope that we could just have someone insert the
> storage into a spare's chassis and boot the server again. It becomes
> trickier if we think of a case where both the SSDs are destroyed. In
> that case your data is gone; we can boot the OS but after many hours all
> you would get is a clean VPS account.
>
> In terms of resilience then, "have backups" is a really good second
> step, as you can put your stuff back on BitFolk or any of the other
> virtual machine hosting providers.
>
> Beyond that, maybe you are thinking that if there were some sort of
> storage snapshot you could at least boot your single VM on a different
> piece of hardware pretty quickly resulting in only minutes of downtime
> without having to pay for any extra VMs or doing anything particularly
> complicated.
>
> There currently is no such facility for this at BitFolk though it
> doesn't seem that hard to do. I can even probably implement migration
> with only a suspend/resore while storage deltas are transferred
> (typically <10 secs pause). The main issue about that from my point of
> view is, I still need to reserve storage and memory for you on another
> host. The only thing saved from my point of view is CPU, and we're not
> short of CPU. So how can I make that service available without charging
> almost the same as an extra VPS?
>
> People using cloud providers often solve these problems by spinning up
> new guests as and when needed. BitFolk is not a cloud provider and
> pivoting into that space is probably not something that can happen any
> time soon, if ever, so unfortunately exploration of solutions in that
> direction will be limited.
>
> Most VM-as-cheap-colo-style providers like BitFolk do not offer live
> migration and migration-in-event-of-failure products, probably because
> of it costing almost the same as two VMs to begin with. Many more do
> offer IPs that you can float about between your VMs programmatically
> and/or by API. The latter sounds a lot more appealing to me than the
> former, but who knows, maybe it could be a way for BitFolk to
> differentiate itself in the marketplace. Something that is sorely
> needed.
>
> As I say, knowing your requirements is going to be a bare minimum for
> you to make progress here regardless of what provider you use, but alsdo
> on a personal level I would be interested to hear what your goals and
> requirements are.
>
> Cheers,
> Andy
>
>
> _______________________________________________
> users mailing list
> users@???
> https://lists.bitfolk.com/mailman/listinfo/users

--
admins@???
www.sheffieldhackspace.org.uk

This message is part of the following thread:
	the complete thread tree sorted by date
	Chris Smith at
	Anthony Newman at