Nathan Gauër 13b44283e9
[CI] Add queue size, running count metrics (#122714)
This commits allows the container to report 3 additional metrics at
every sampling event:
- a heartbeat
- the size of the workflow queue (filtered)
- the number of running workflows (filtered)

The heartbeat is a simple metric allowing us to monitor the metrics
health. Before this commit, a new metrics was pushed only when a
workflow was completed. This meant we had to wait a few hours
before noticing if the metrics container was unable to push metrics.

In addition to this, this commits adds a sampling of the workflow
queue size and running count. This should allow us to better understand
the load, and improve the autoscale values we pick for the cluster.

---------

Signed-off-by: Nathan Gauër <brioche@google.com>
2025-01-16 11:41:49 +01:00
..