Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add kube prometheus stack #248

Merged
merged 38 commits into from
Feb 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
05c09ad
initial setup
Jan 20, 2025
f157656
barman
Jan 20, 2025
5a90afb
feat(container): update ghcr.io/onedr0p/actions-runner ( 2.321.0 → 2.…
renovate[bot] Jan 26, 2025
a9f2ecb
fix(container): update ghcr.io/onedr0p/prowlarr-develop ( 1.30.1.4928…
renovate[bot] Jan 26, 2025
c6af0f8
fix(container): update ghcr.io/onedr0p/home-assistant ( 2025.1.2 → 20…
renovate[bot] Jan 26, 2025
8e760b7
fix(container): update ghcr.io/coder/code-server ( 4.96.2 → 4.96.4 ) …
renovate[bot] Jan 26, 2025
bf5114b
feat(helm): update external-secrets ( 0.12.1 → 0.13.0 ) (#224)
renovate[bot] Jan 26, 2025
51dce08
chore(mise): upgrade dependencies (#219)
github-actions[bot] Jan 26, 2025
6325cae
fix(helm): update grafana ( 8.8.4 → 8.8.5 ) (#223)
renovate[bot] Jan 26, 2025
5a1f4c9
fix(helm): update cilium ( 1.16.5 → 1.16.6 ) (#221)
renovate[bot] Jan 26, 2025
778e92e
fix(helm): update coredns ( 1.37.2 → 1.37.3 ) (#222)
renovate[bot] Jan 26, 2025
1f21bf6
fix(container): update ghcr.io/onedr0p/sonarr-develop ( 4.0.12.2866 →…
renovate[bot] Jan 26, 2025
433adc0
fix(container): update jellyfin/jellyfin ( 10.10.3 → 10.10.5 ) (#230)
renovate[bot] Jan 26, 2025
de2f8f9
get rid of templates dir
Jan 26, 2025
cecf967
disable db
Jan 26, 2025
7b5d53d
rework postgres (#231)
RonaldPhilipsen Jan 26, 2025
72547d5
do not reference 16
Jan 26, 2025
f688de1
feat(helm): update coredns ( 1.37.3 → 1.38.1 ) (#232)
renovate[bot] Jan 27, 2025
28f6b89
chore(mise): upgrade dependencies (#233)
github-actions[bot] Jan 29, 2025
b967361
chore(mise): upgrade dependencies (#242)
github-actions[bot] Jan 30, 2025
2a07e47
feat(helm): update coredns ( 1.38.1 → 1.39.0 ) (#241)
renovate[bot] Jan 30, 2025
65c828c
fix(helm): update openebs ( 4.1.2 → 4.1.3 ) (#240)
renovate[bot] Jan 30, 2025
4e97635
fix(helm): update external-dns ( 1.15.0 → 1.15.1 ) (#239)
renovate[bot] Jan 30, 2025
67d4fef
chore(container): update ghcr.io/onedr0p/sonarr-develop ( b2b3e30 → 6…
renovate[bot] Jan 30, 2025
9afb6e2
chore(container): update ghcr.io/onedr0p/sabnzbd ( 4188d3c → fd85776 …
renovate[bot] Jan 30, 2025
da7744d
chore(container): update ghcr.io/onedr0p/radarr-develop ( 64364aa → f…
renovate[bot] Jan 30, 2025
e6935aa
chore(container): update ghcr.io/onedr0p/prowlarr-develop ( 1cf5d5e →…
renovate[bot] Jan 30, 2025
a21d88c
chore(container): update ghcr.io/onedr0p/home-assistant ( 0d20c91 → 6…
renovate[bot] Jan 30, 2025
23fd847
Update flux-diff.yaml
RonaldPhilipsen Jan 30, 2025
b77c2de
fix(container): update ghcr.io/siderolabs/installer ( v1.9.2 → v1.9.3 )
renovate[bot] Jan 30, 2025
c3f82fd
chore(mise): upgrade dependencies (#243)
github-actions[bot] Feb 1, 2025
d81683a
feat(helm): update intel-device-plugins-operator ( 0.31.1 → 0.32.0 ) …
renovate[bot] Feb 1, 2025
e7e80ec
feat(helm): update intel-device-plugins-gpu ( 0.31.1 → 0.32.0 ) (#245)
renovate[bot] Feb 1, 2025
fc1dec5
fix(helm): update grafana ( 8.8.5 → 8.8.6 ) (#244)
renovate[bot] Feb 1, 2025
efa51c6
fix(container): update docker.io/cloudflare/cloudflared ( 2025.1.0 → …
renovate[bot] Feb 1, 2025
0dfca03
fix(container): update ghcr.io/onedr0p/sonarr-develop ( 4.0.12.2892 →…
renovate[bot] Feb 1, 2025
a944532
add kube-prometheis-stack
Feb 1, 2025
41b7b0f
Merge branch 'main' into add-kube-prometheus-stack
RonaldPhilipsen Feb 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/monitoring.coreos.com/alertmanagerconfig_v1alpha1.json
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager
spec:
route:
groupBy: ["alertname", "job"]
groupInterval: 10m
groupWait: 1m
receiver: pushover
repeatInterval: 12h
routes:
- receiver: "null"
matchers:
- name: alertname
value: InfoInhibitor
matchType: =
- receiver: heartbeat
groupInterval: 5m
groupWait: 0s
repeatInterval: 5m
matchers:
- name: alertname
value: Watchdog
matchType: =
- receiver: email
matchers:
- name: severity
value: critical
matchType: =
inhibitRules:
- equal: ["alertname", "namespace"]
sourceMatch:
- name: severity
value: critical
matchType: =
targetMatch:
- name: severity
value: warning
matchType: =
receivers:
- name: "null"
- name: heartbeat
webhookConfigs:
- urlSecret:
name: &secret alertmanager-secret
key: ALERTMANAGER_HEARTBEAT_URL
- name: email
emailConfigs:
# Whether to notify about resolved alerts.
- sendResolved: true
to: 'alerts@${SECRET_DOMAIN}'
from: 'alertmanager@${SECRET_DOMAIN}'
hello: k8s@${SECRET_DOMAIN}
# The smarthost and SMTP sender used for mail notifications.
smarthost: ${ALERTMANAGER_SMTP_HOST}
authUsername: ${ALERTMANAGER_SMTP_USERNAME}
authPassword:
key: *secret
name: ALERTMANAGER_SMTP_PASSWORD
text: >-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ .CommonLabels.alertname }}
html: |-
{{- range .Alerts }}
{{- if ne .Annotations.description "" }}
{{ .Annotations.description }}
{{- else if ne .Annotations.summary "" }}
{{ .Annotations.summary }}
{{- else if ne .Annotations.message "" }}
{{ .Annotations.message }}
{{- else }}
Alert description not available
{{- end }}
{{- if gt (len .Labels.SortedPairs) 0 }}
<small>
{{- range .Labels.SortedPairs }}
<b>{{ .Name }}:</b> {{ .Value }}
{{- end }}
</small>
{{- end }}
{{- end }}

Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/external-secrets.io/externalsecret_v1beta1.json
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: alertmanager
spec:
refreshInterval: 5m
secretStoreRef:
kind: ClusterSecretStore
name: onepassword
target:
name: alertmanager-secret
template:
data:
ALERTMANAGER_HEARTBEAT_URL: "{{ .ALERTMANAGER_HEARTBEAT_URL }}"
ALERTMANAGER_SMTP_PASSWORD: "{{ .ALERTMANAGER_SMTP_PASSWORD }}"
dataFrom:
- extract:
key: alertmanager
148 changes: 148 additions & 0 deletions kubernetes/apps/observability/kube-prometheus-stack/helmrelease.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: kube-prometheus-stack
spec:
interval: 30m
chart:
spec:
chart: kube-prometheus-stack
version: 68.2.1
sourceRef:
kind: HelmRepository
name: prometheus-community
namespace: flux-system
install:
crds: Skip
remediation:
retries: 3
upgrade:
cleanupOnFail: true
crds: Skip
remediation:
strategy: rollback
retries: 3
dependsOn:
- name: kube-prometheus-stack-crds
namespace: observability
values:
crds:
enabled: false
cleanPrometheusOperatorObjectNames: true
alertmanager:
ingress:
enabled: true
ingressClassName: internal
hosts: ["alertmanager.${SECRET_DOMAIN}"]
pathType: Prefix
alertmanagerSpec:
alertmanagerConfiguration:
name: alertmanager
global:
resolveTimeout: 5m
externalUrl: https://alertmanager.${SECRET_DOMAIN}
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-provision
resources:
requests:
storage: 1Gi
kubeApiServer:
serviceMonitor:
selector:
k8s-app: kube-apiserver
kubeScheduler:
service:
selector:
k8s-app: kube-scheduler
kubeControllerManager: &kubeControllerManager
service:
selector:
k8s-app: kube-controller-manager
kubeEtcd:
<<: *kubeControllerManager # etcd runs on control plane nodes
kubeProxy:
enabled: false
prometheus:
ingress:
enabled: true
ingressClassName: internal
hosts: ["prometheus.${SECRET_DOMAIN}"]
pathType: Prefix
prometheusSpec:
podMonitorSelectorNilUsesHelmValues: false
probeSelectorNilUsesHelmValues: false
ruleSelectorNilUsesHelmValues: false
scrapeConfigSelectorNilUsesHelmValues: false
serviceMonitorSelectorNilUsesHelmValues: false
enableAdminAPI: true
walCompression: true
enableFeatures:
- memory-snapshot-on-shutdown
retention: 14d
retentionSize: 50GB
resources:
requests:
cpu: 100m
limits:
memory: 2000Mi
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: nfs-provision
resources:
requests:
storage: 50Gi
prometheus-node-exporter:
fullnameOverride: node-exporter
prometheus:
monitor:
enabled: true
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels: ["__meta_kubernetes_pod_node_name"]
targetLabel: kubernetes_node
kube-state-metrics:
fullnameOverride: kube-state-metrics
metricLabelsAllowlist:
- pods=[*]
- deployments=[*]
- persistentvolumeclaims=[*]
prometheus:
monitor:
enabled: true
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels: ["__meta_kubernetes_pod_node_name"]
targetLabel: kubernetes_node
grafana:
enabled: false
forceDeployDashboards: true
additionalPrometheusRulesMap:
dockerhub-rules:
groups:
- name: dockerhub
rules:
- alert: DockerhubRateLimitRisk
annotations:
summary: Kubernetes cluster Dockerhub rate limit risk
expr: count(time() - container_last_seen{image=~"(docker.io).*",container!=""} < 30) > 100
labels:
severity: critical
oom-rules:
groups:
- name: oom
rules:
- alert: OomKilled
annotations:
summary: Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.
expr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1
labels:
severity: critical
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# yaml-language-server: $schema=https://json.schemastore.org/kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ./alertmanagerconfig.yaml
- ./externalsecret.yaml
- ./helmrelease.yaml
9 changes: 6 additions & 3 deletions kubernetes/flux/meta/settings/cluster-secrets.sops.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ stringData:
CLUSTER_SVC_V6_PREFIX: ENC[AES256_GCM,data:qVmaFX2V2/TF1z9gLij+ZEzblucO,iv:T8UYxEN8r1A7nSqalS7Mxw0Dn8saDKckYFbHP+V38JM=,tag:RJswikeCI4a0xZv4i7Yegg==,type:str]
CLUSTER_LBA_V6_CIDR: ENC[AES256_GCM,data:BUDk53jJv3VYKiMPaHwh69omm7QCB8zmksNy,iv:rTkABwkYE84F36OrY1AsUdx1/3EryaCo8in91Vqwuxk=,tag:Ot/288eN/+u4T9Wg8otezA==,type:str]
CLUSTER_NODE_V6_CIDR: ENC[AES256_GCM,data:9EtUqN4vA5pzYGPigwqhVc8oMw==,iv:rpnJtQ7E1sW/D7IYxOYtA7TU+cl+tMyAe3oEZ+Kgqks=,tag:n0muTnj1K0dgJgQjcUiuDQ==,type:str]
#ENC[AES256_GCM,data:brFBypll5QOb7yyqt/gHs5rH75+FXbW753m8,iv:wC1nkZUN3nBS+7ZCvGi1K8aYWXG7E0Ywr+H/vZORzAM=,tag:ZN3sr+XSdxDz9zPO+vRZFw==,type:comment]
ALERTMANAGER_SMTP_USERNAME: ENC[AES256_GCM,data:33AiYpDOJ41hHhTgfLsGUglWVk8KwVw=,iv:MBDtmvhgo4urPMHJDRIgPmS5avRg8//5N+YhWeECqtw=,tag:aP6ht3OlN902f9877OaAng==,type:str]
ALERTMANAGER_SMTP_HOST: ENC[AES256_GCM,data:O8rXlZjoe9xwRqXd0Iy7eNS4,iv:oJHPANSEbV2LYf0+z8JQS4kK25gZoOS66STciO4yneI=,tag:7svebzCwZU5JbhTLgrFtiw==,type:str]
sops:
kms: []
gcp_kms: []
Expand All @@ -29,9 +32,9 @@ sops:
dEJCQ0VzcEVlWmdDYUs5Nm9jYTVXckkKr8OGj284W6dhf5uUFtpwPX1eaz0dYWx2
uy6dvYEY+SSVSGaojydt8IFU80vhaQIslI2A7hIjNmGY6s5Pl2Zpnw==
-----END AGE ENCRYPTED FILE-----
lastmodified: "2025-01-18T21:51:24Z"
mac: ENC[AES256_GCM,data:rmU+URrHaloXPKthGXfStu4T2/2XhL7NYrwK5ZjUkFmFGNCHlihEjpOW+gowPPA2Dhb7I6wVCAT9Ix+2ir4EYi+xx3Q3Zkbc4dh+QzvbgXnFuWQcTgb6l8ePpsNDVVWDz6fRyI/1m+bky67vhqRXmXjJglxnD+ZIEIBOIdI2bsA=,iv:sklvnbvwADKvrr2LlLCtmCLFOPYqSVwFkzsv3xV0mHE=,tag:9B3AxDUa1CMenwatD77pVA==,type:str]
lastmodified: "2025-02-01T13:32:02Z"
mac: ENC[AES256_GCM,data:ww6bzEoYf0i2ChcHXsQzv1j4ijpoO5O/3o5r1urAckvH9UnO5Dg2mDd8wv2ZZSbueibBpgJAV/V+FpUPIyAqaN6m5aLsGHSz/usfhz62fCownI9zv/gnfCbvGNTa19EL5Cnniv6gc5dtUZOlkyMOfmO0Ps++fsgeG1TyF3q+gZM=,iv:eRcpVyQUzr1YeIpurtqklMKv3y3F2Vn+oiiTIddfKWk=,tag:6+cuyd7fLGBGBb4jNqoENg==,type:str]
pgp: []
encrypted_regex: ^(data|stringData)$
mac_only_encrypted: true
version: 3.9.3
version: 3.9.4