Re: [bitfolk] predicted network usage / "Predicted to exceed…

Top Page

Reply to this message
Author: Conrad Wood
Date:  
To: Andy Smith, users
Subject: Re: [bitfolk] predicted network usage / "Predicted to exceed transfer quota"
On Fri, 2022-02-25 at 20:47 +0000, Andy Smith wrote:
> Hi Conrad,
>
> On Fri, Feb 25, 2022 at 04:08:32PM +0000, Conrad Wood wrote:
> > I think I'm missing something. The "predicted" on the panel[1]
> > looks
> > rather different to grafana.
> > The screenshots were taken about 5 seconds apart.
> > Is that expected?
>
> Yes. The prediction algorithm was explained here:
>
>    
> https://lists.bitfolk.com/lurker/message/20220223.204720.d86cf277.en.html
>
> The two things (emailed notifications about data transfer, and the
> figures on the panel about data transfer) have their predictions
> calculated differently.
>
> For the email notifications:
>
> - First it does a lightweight prediction by taking the sum of data
>   transferred this period, averaging it to a per-second value and
>   then multiplying that by the number of seconds in 30 days.
>
> - If:
>
>     - This prediction indicates a state transition, either from
>       "predicted okay" to "predicted over" or the reverse, AND
>     - You're in the first 15 days of the period, THEN:
>
>   a more heavyweight prediction is done by adding up the actual
>   figures for the last 30 days. That is then used as the prediction.
>
> For the panel:
>
> - Just uses the simple lightweight prediction.
>
> Reasoning: If we're going to send a notification about a state
> change, either to tell you to worry or to tell you not to worry, it
> would be better to use more accurate figures for that prediction.
> The lightweight prediction might be using only a small amount of
> data. In your case right now about 3 days' worth to guess about a 30
> day period.
>
> If you look back at the last email notification sent to you, you'll
> see that it predicted you'll go over again this period, and its
> predicted figure matches what our Prometheus is now telling you,
> because that's always how it's worked.
>
> It's true that the simple prediction for you is right now returning
> a much lower figure. You can look back in your bandwidth graph and
> see why that is: There was a prolonged period of high usage a while
> ago, but within the last 30 days. If we ignore that period then of
> course the prediction will come out much lower. Is it correct to
> ignore that period? Who knows, we just have to pick an algorithm,
> you could argue either way.
>
> So why not always use the more accurate prediction on the panel as
> well?
>
> The reason is historical. The figures are stored in a regular SQL
> database and summing up 30 days of metrics from that source takes an
> appreciable amount of time. This setup pre-dates Prometheus or any
> other sort of time series database we had going.
>
> So, we have only been doing it when it's considered important to do
> so, in the batch job that sends out email notifications about data
> transfer limits, and then the calculated predictions were thrown
> away. It wasn't considered feasible to do those calculations in a
> web page.
>
> By the time you are 15 days in to this reporting period the two
> figures will match.
>
> So, what you are seeing is expected but it doesn't mean we can't
> improve things.
>
> I do not want to calculate the official billing figures from
> Prometheus because I don't want to make Prometheus (or Grafana) an
> essential piece of infrastructure.
>
> I could store the calculated predictions back in the database and
> use those on the panel.
>
> It's always worked like this and no one has really noticed in more
> than 10 years, probably because hardly anyone reaches their limits
> and when they do it's usually nearer to the end of the reporting
> period. It should be pretty easy to store the calculated predictions
> though, I've just never thought of doing it before.
>
> Cheers,
> Andy



Understood.
I suggest we let this run for a couple of months and observe.
I agree the prediction does not have to be super accurate, so I would
expect that what we have now should work well.

Thank you!

Conrad