Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add avalanche like component for testing #5965

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@
/.eventcache
vendor
data-agent

/cmd/benchmark/data/
/cmd/benchmark/main
/cmd/benchmark/grafana-agent-flow
/cmd/benchmark/benchmark
/cmd/agent/agent
/cmd/agentctl/agentctl
/cmd/agent-operator/agent-operator
Expand All @@ -24,4 +27,4 @@ cover*.out
.uptodate
node_modules

/docs/variables.mk.local
/docs/variables.mk.local
52 changes: 52 additions & 0 deletions cmd/benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Benchmark notes

These are synthetic benchmarks meant to represent common workloads. These are not meant to be exhaustive or fine-grained.
These will give a coarse idea of how the agent behaves in a situations.

## Prometheus Metrics

### Running the benchmarks

Running `PROM_USERNAME="" PROM_PASSWORD="" ./benchmark.sh` will start the benchmark and run for 8 hours. The duration and type of tests
can be adjusted by editing the `metris.sh` file. This will start two Agents and the benchmark runner. Relevant CPU and memory metrics
will be sent to the endpoint described in `normal.river`.

TODO: Add mixin for graph I am using

### Adjusting the benchmark

Each benchmark can be adjusted within `test.river`. These settings allow fine tuning to a specific scenario. Each `prometheus.test.metric` component
exposes a service discovery URL that is used to collect the targets.

### Benchmark categories

#### prometheus.test.metrics "single"

This roughly represents a single node exporter and is the simpliest use case. Every `10m` 5% of the metrics are replaced driven by `churn_percent`.

#### prometheus.test.metrics "many"

This roughly represents scraping many node_exporter instances in say a Kubernetes environment.

#### prometheus.test.metrics "large"

This represents scraping 2 very large instances with 1,000,000 series.

#### prometheus.test.metrics "churn"

This represents a worst case scenario, 2 large instances with an extremely high churn rate.

### Adjusting the tests

`prometheus.relabel` is often a CPU bottleneck so adding additional rules allows you to test the impact of that.

### Rules

There are existing rules to only send to the prometheus remote write the specific metrics that matter. These are tagged with the `runtype` and the benchmark. For instance `normal-large`.

The benchmark starts an endpoint to consume the metrics from `prometheus.test.metrics`, in half the tests it will return HTTP Status 200 and in the other half will return 500.

TODO add optional pyroscope profiles


## Loki Logs
61 changes: 61 additions & 0 deletions cmd/benchmark/configs/logs.river
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
logging {
level = "debug"
}

prometheus.scrape "scraper" {
targets = concat([{"__address__" = "localhost:12346"}])
forward_to = [prometheus.relabel.mutator.receiver]
scrape_interval = "60s"
}

prometheus.relabel "mutator" {
rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = "normal"
target_label = "runtype"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NAME")
target_label = "test_name"
}

rule {
source_labels = ["__name__"]
action = "keep"
regex = "(agent_wal_storage_active_series|agent_resources_process_cpu_seconds_total|go_memstats_alloc_bytes|go_gc_duration_seconds_sum|go_gc_duration_seconds_count|loki_source_file_files_active_total|loki_write_encoded_bytes_total|loki_write_sent_bytes_total|loki_sum_source_file_read_bytes_total)"
}

forward_to = [prometheus.remote_write.agent_stats.receiver]
}

prometheus.remote_write "agent_stats" {
endpoint {
url = "https://prometheus-us-central1.grafana.net/api/prom/push"

basic_auth {
username = env("PROM_USERNAME")
password = env("PROM_PASSWORD")
}
}
}

local.file_match "logs" {
path_targets = [
{__path__ = "./data/logs-gen/loki.test.logs.logs/*.log"},
]
}

loki.source.file "tmpfiles" {
targets = local.file_match.logs.targets
forward_to = [loki.write.local.receiver]
}

loki.write "local" {
endpoint {
url = "http://localhost:8888/post"
}
}
10 changes: 10 additions & 0 deletions cmd/benchmark/configs/logsgen.river
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
loki.test.logs "logs" {
number_of_files = 100
file_churn_percent = .25
file_refresh = "1m"
write_cadence = "1s"
writes_per_cadence = 100
labels = {
"instance" = "localhost",
}
}
74 changes: 74 additions & 0 deletions cmd/benchmark/configs/normal.river
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
logging {
level = "debug"
}

discovery.http "disco" {
url = env("DISCOVERY")
}

prometheus.scrape "scraper" {
targets = concat([{"__address__" = env("HOST")}])
forward_to = [prometheus.relabel.mutator.receiver]
scrape_interval = "60s"
}

prometheus.relabel "mutator" {
rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("RUNTYPE")
target_label = "runtype"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NAME")
target_label = "test_name"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NETWORK_DOWN")
target_label = "remote_write_enable"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("DISCOVERY")
target_label = "discovery"
}

rule {
source_labels = ["__name__"]
action = "keep"
regex = "(agent_wal_storage_active_series|agent_resources_process_cpu_seconds_total|go_memstats_alloc_bytes|go_memstats_heap_inuse_bytes|go_gc_duration_seconds_sum|go_gc_duration_seconds_count)"
}

forward_to = [prometheus.remote_write.agent_stats.receiver]
}

prometheus.remote_write "agent_stats" {
endpoint {
url = "https://prometheus-us-central1.grafana.net/api/prom/push"

basic_auth {
username = env("PROM_USERNAME")
password = env("PROM_PASSWORD")
}
}
}

prometheus.scrape "data" {
targets = discovery.http.disco.targets
forward_to = [prometheus.remote_write.empty.receiver]
scrape_interval = "60s"
}

prometheus.remote_write "empty" {
endpoint {
url = "http://localhost:8888/post"
}
}
81 changes: 81 additions & 0 deletions cmd/benchmark/configs/relabel_large_cache.river
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
discovery.http "disco" {
url = env("DISCOVERY")
}

prometheus.scrape "scraper" {
targets = concat([{"__address__" = env("HOST")}])
forward_to = [prometheus.relabel.mutator.receiver]
scrape_interval = "60s"
}

prometheus.relabel "mutator" {
rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("RUNTYPE")
target_label = "runtype"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NAME")
target_label = "test_name"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NETWORK_DOWN")
target_label = "remote_write_enable"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("DISCOVERY")
target_label = "discovery"
}

rule {
source_labels = ["__name__"]
action = "keep"
regex = "(agent_wal_storage_active_series|agent_resources_process_cpu_seconds_total|go_memstats_alloc_bytes|go_memstats_heap_inuse_bytes|go_gc_duration_seconds_sum|go_gc_duration_seconds_count)"
}

forward_to = [prometheus.remote_write.agent_stats.receiver]
}

prometheus.remote_write "agent_stats" {
endpoint {
url = "https://prometheus-us-central1.grafana.net/api/prom/push"

basic_auth {
username = env("PROM_USERNAME")
password = env("PROM_PASSWORD")
}
}
}

prometheus.scrape "data" {
targets = discovery.http.disco.targets
forward_to = [prometheus.relabel.default.receiver]
scrape_interval = "60s"
}

prometheus.relabel "default" {
max_cache_size = 1000000

rule {
source_labels = ["__name__"]
regex = "(agent_metric.+)"
action = "keep"
}
forward_to = [prometheus.remote_write.empty.receiver]
}

prometheus.remote_write "empty" {
endpoint {
url = "http://localhost:8888/post"
}
}
79 changes: 79 additions & 0 deletions cmd/benchmark/configs/relabel_normal_cache.river
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
discovery.http "disco" {
url = env("DISCOVERY")
}

prometheus.scrape "scraper" {
targets = concat([{"__address__" = env("HOST")}])
forward_to = [prometheus.relabel.mutator.receiver]
scrape_interval = "60s"
}

prometheus.relabel "mutator" {
rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("RUNTYPE")
target_label = "runtype"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NAME")
target_label = "test_name"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("NETWORK_DOWN")
target_label = "remote_write_enable"
}

rule {
source_labels = ["__name__"]
regex = "(.+)"
replacement = env("DISCOVERY")
target_label = "discovery"
}

rule {
source_labels = ["__name__"]
action = "keep"
regex = "(agent_wal_storage_active_series|agent_resources_process_cpu_seconds_total|go_memstats_heap_inuse_bytes|go_memstats_alloc_bytes|go_gc_duration_seconds_sum|go_gc_duration_seconds_count)"
}

forward_to = [prometheus.remote_write.agent_stats.receiver]
}

prometheus.remote_write "agent_stats" {
endpoint {
url = "https://prometheus-us-central1.grafana.net/api/prom/push"

basic_auth {
username = env("PROM_USERNAME")
password = env("PROM_PASSWORD")
}
}
}

prometheus.scrape "data" {
targets = discovery.http.disco.targets
forward_to = [prometheus.relabel.default.receiver]
scrape_interval = "60s"
}

prometheus.relabel "default" {
rule {
source_labels = ["__name__"]
regex = "(agent_metric.+)"
action = "keep"
}
forward_to = [prometheus.remote_write.empty.receiver]
}

prometheus.remote_write "empty" {
endpoint {
url = "http://localhost:8888/post"
}
}
Loading
Loading