-
Notifications
You must be signed in to change notification settings - Fork 809
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Backend: Integrate alert manager to monitoring setup (#3507)
* add view name to statsd metrics * cleanup unncessary files with gitignore and fix import style * switch to views instead of endpoint for metrics * fix import order * rebase on top of statsd * add pushgateway for django-prometheus and export sample push custom metrics * add monitoring for submission queue using prometheus counters * fix staging configs and add labels to queue counters * fix staging configs and add labels to queue counters * fix tests for submission worker * push metrics to gateway using prometheus client * cleanup unncessary files * fix push metrics key for submission worker * fix import order * cleanup unncessary files with gitignore * change counters name for queue monitoring * add view name to statsd metrics * cleanup unncessary files with gitignore and fix import style * add grafana env variables for staging * add configs to deploy gateway and nodeexporter to staging setup * fix configs for staging server * fix staging configs and add labels to queue counters * Integrate alert manager to monitoring setup * fix configs for staging server * add newlines to files and update gitignore * fix configs and alert templates * add secret file for alertmanager config for staging * cleanup unncessary files * cleanup unncessary files with gitignore * fix alertmanager rules for statsd metrics * fix dev uwsgi settings * fix route for exposing alertmanager on staging * rebase on top of node-exporter * update auto-deploy command * rebase on top of master * fix alert rules * change alert message for api threshold * revert changes for setting alertmanager on same instance * fix configs of alertmanager for new nginx-ingress * fix indentation of autodeployment script * add auto deploy commands to setup alertmanager * rebase after removing pushgateway * fix alertmanager configs to keep dev and staging consistent * fix alertmanager route names * remove default alertmanager endpoint configs * cleanup files * add default receiver to alertmanager * change name of alertmanager config file Co-authored-by: Rishabh Jain <[email protected]>
- Loading branch information
1 parent
c9b05b5
commit bc7b8eb
Showing
12 changed files
with
164 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{{ define "slack.apis.title" -}} | ||
{{- if .CommonAnnotations.title -}} | ||
{{- .CommonAnnotations.title -}} | ||
{{- else -}} | ||
API-Threshold-Exceeded | ||
{{- end -}} | ||
{{- end }} | ||
{{ define "slack.apis.text" -}} | ||
{{- if .CommonAnnotations.description -}} | ||
{{- .CommonAnnotations.description -}} | ||
{{- else -}} | ||
{{- range $i, $alert := .Alerts }} | ||
{{- "\n" -}}{{- .Annotations.description -}} | ||
{{- end -}} | ||
{{- end -}} | ||
{{- end }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{{ define "slack.instances.title" -}} | ||
{{- if .CommonAnnotations.title -}} | ||
{{- .CommonAnnotations.title -}} | ||
{{- else -}} | ||
{{- with index .Alerts 0 -}} | ||
{{- .Annotations.title -}} | ||
{{- end -}} | ||
{{- end -}} | ||
{{- end }} | ||
{{ define "slack.instances.text" -}} | ||
{{- if .CommonAnnotations.description -}} | ||
{{- .CommonAnnotations.description -}} | ||
{{- else -}} | ||
{{- range $i, $alert := .Alerts }} | ||
{{- "\n" -}}{{- .Annotations.description -}} | ||
{{- end -}} | ||
{{- end -}} | ||
{{- end }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
global: | ||
resolve_timeout: 10s | ||
slack_api_url: 'https://hooks.slack.com/services/x/x/x' | ||
|
||
route: | ||
receiver: 'slack-apis-notifications' | ||
group_interval: 10s | ||
repeat_interval: 10s | ||
routes: | ||
- matchers: [group = api] | ||
receiver: 'slack-apis-notifications' | ||
- matchers: [group = instance] | ||
receiver: 'slack-instance-notifications' | ||
|
||
receivers: | ||
- name: 'slack-apis-notifications' | ||
slack_configs: | ||
- channel: '#x' | ||
title: '{{ template "slack.apis.title" . }}' | ||
text: '{{ template "slack.apis.text" . }}' | ||
send_resolved: false | ||
- name: 'slack-instance-notifications' | ||
slack_configs: | ||
- channel: '#x' | ||
title: '{{ template "slack.instances.title" . }}' | ||
text: '{{ template "slack.instances.text" . }}' | ||
send_resolved: false | ||
|
||
templates: | ||
- '/etc/alertmanager/templates/instances.tmpl' | ||
- '/etc/alertmanager/templates/apis.tmpl' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
groups: | ||
- name: API-Threshold-Exceeded | ||
rules: | ||
- alert: API-Threshold-Exceeded-5XX | ||
expr: rate(django_request_count{job="statsd",method="GET",status=~"5..",view=~"jobs:.*|challenges:.*"}[5m]) > 0.3 | ||
for: 5m | ||
annotations: | ||
title: 'API-Threshold-Exceeded - 5XX' | ||
description: '•*{{ $labels.view }}* had *{{ $value | printf "%.1f" }}* QPS rate with the response code of *5XX* in the last 5 minutes' | ||
labels: | ||
severity: 'critical' | ||
group: 'api' | ||
|
||
- name: Instance-Status | ||
rules: | ||
- alert: InstanceDown | ||
expr: up == 0 | ||
for: 5m | ||
annotations: | ||
title: "Instance(s) Down" | ||
description: "•*{{ $labels.instance }}* of prometheus job *{{ $labels.job }}* has been down for more than 5 minutes" | ||
labels: | ||
severity: major | ||
group: 'instance' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters