Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query prometheus metrics from benchmark #3473

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

ndr-ds
Copy link
Contributor

@ndr-ds ndr-ds commented Mar 4, 2025

Motivation

Right now, the only way of knowing if the benchmark is completely destroying validators is by checking Grafana for the metrics (which have a 2m delay), and stopping the benchmark manually. This is not ideal, as if latency becomes too high, the validators can take a very long time to recover after it.

Proposal

Introduce a --health_check_endpoints to linera benchmark, that if provided, will query the specified metrics host:port pairs for metrics, to determine the health of the validators. For now this health is determined by the p99 proxy latency being below 400 ms.
We're querying the metrics endpoint from the proxy directly with this, so it works on all different kinds of deployments (docker compose, kubernetes, just binaries running locally, etc). The user is responsible for making sure the metrics endpoint is available at the provided endpoints (might need to port forward the metrics port, etc).
We'll query the endpoints once every 5 seconds, and calculate the p99 of the different validators in those last 5s. If we see one validator being above the 400 ms threshold, automatically stop the benchmark. We calculate the p99 doing the linear interpolation manually, based on the histogram data we get from the endpoints.

Test Plan

Ran a network locally, ran the benchmark against it, saw the p99 values being printed matched what Grafana reported as well, and that if we cross the threshold the benchmark gets stopped automatically.

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 5a6b566 to 2fc2b51 Compare March 4, 2025 20:38
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch 3 times, most recently from 2666a06 to 25bb2d7 Compare March 4, 2025 21:54
@ndr-ds ndr-ds changed the base branch from 03-04-pretty_print_chain_info_response_on_query_validators to graphite-base/3473 March 5, 2025 14:01
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 25bb2d7 to e14cf17 Compare March 5, 2025 14:01
@ndr-ds ndr-ds force-pushed the graphite-base/3473 branch from 57ce110 to 99e35bc Compare March 5, 2025 14:01
@ndr-ds ndr-ds changed the base branch from graphite-base/3473 to 03-04-end_value_fix_for_linear_bucket_interval March 5, 2025 14:01
@ndr-ds ndr-ds force-pushed the 03-04-end_value_fix_for_linear_bucket_interval branch from 99e35bc to 1b69a51 Compare March 5, 2025 14:52
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from e14cf17 to 8f5e1c6 Compare March 5, 2025 14:52
@ndr-ds ndr-ds changed the base branch from 03-04-end_value_fix_for_linear_bucket_interval to graphite-base/3473 March 5, 2025 15:50
@ndr-ds ndr-ds force-pushed the graphite-base/3473 branch from 1b69a51 to ba839b5 Compare March 5, 2025 16:15
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 8f5e1c6 to 32b2bce Compare March 5, 2025 16:15
@ndr-ds ndr-ds changed the base branch from graphite-base/3473 to 03-04-end_value_fix_for_linear_bucket_interval March 5, 2025 16:15
@ndr-ds ndr-ds force-pushed the 03-04-end_value_fix_for_linear_bucket_interval branch from ba839b5 to d400116 Compare March 5, 2025 17:31
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 32b2bce to 26e8c6a Compare March 5, 2025 17:31
@ndr-ds ndr-ds force-pushed the 03-04-end_value_fix_for_linear_bucket_interval branch from d400116 to 34f9c49 Compare March 5, 2025 17:34
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch 2 times, most recently from 5824eec to 73e32e8 Compare March 5, 2025 17:43
@ndr-ds ndr-ds force-pushed the 03-04-end_value_fix_for_linear_bucket_interval branch from 34f9c49 to 456101c Compare March 5, 2025 17:43
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 73e32e8 to bb69877 Compare March 5, 2025 17:45
@ndr-ds ndr-ds force-pushed the 03-04-end_value_fix_for_linear_bucket_interval branch from 456101c to 4082ac3 Compare March 5, 2025 17:45
@ndr-ds ndr-ds changed the base branch from 03-04-end_value_fix_for_linear_bucket_interval to graphite-base/3473 March 5, 2025 18:25
@ndr-ds ndr-ds force-pushed the graphite-base/3473 branch from 4082ac3 to b7e963a Compare March 5, 2025 18:27
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from bb69877 to 6caf011 Compare March 5, 2025 18:27
@ndr-ds ndr-ds changed the base branch from graphite-base/3473 to main March 5, 2025 18:28
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 6caf011 to 76c77e7 Compare March 5, 2025 18:28
@ndr-ds ndr-ds changed the base branch from main to graphite-base/3473 March 5, 2025 22:52
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 76c77e7 to 24ab520 Compare March 5, 2025 22:52
@ndr-ds ndr-ds changed the base branch from graphite-base/3473 to 03-05-adding_scylladb_cpu_usage_graph_some_other_changes March 5, 2025 22:52
@ndr-ds ndr-ds changed the base branch from 03-05-adding_scylladb_cpu_usage_graph_some_other_changes to graphite-base/3473 March 5, 2025 22:52
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 24ab520 to 37b5af1 Compare March 5, 2025 22:52
@ndr-ds ndr-ds force-pushed the graphite-base/3473 branch from 87cb69a to fcbe2c4 Compare March 5, 2025 22:52
@ndr-ds ndr-ds changed the base branch from graphite-base/3473 to main March 5, 2025 22:53
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 37b5af1 to 6508b5c Compare March 5, 2025 22:53
Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 6508b5c to 6651527 Compare March 6, 2025 01:34
@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch 3 times, most recently from e016dce to 65d1362 Compare March 6, 2025 15:07
Copy link
Contributor Author

ndr-ds commented Mar 6, 2025

Merge activity

  • Mar 6, 10:59 AM EST: A user started a stack merge that includes this pull request via Graphite.
  • Mar 6, 10:59 AM EST: Graphite rebased this pull request as part of a merge.
  • Mar 6, 11:00 AM EST: A user merged this pull request with Graphite.

@ndr-ds ndr-ds force-pushed the 02-24-query_prometheus_metrics_from_benchmark branch from 65d1362 to f3fd01e Compare March 6, 2025 15:59
@ndr-ds ndr-ds merged commit 1d4ee73 into main Mar 6, 2025
25 checks passed
@ndr-ds ndr-ds deleted the 02-24-query_prometheus_metrics_from_benchmark branch March 6, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants