Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sandboxfy alerting tutorial part 2 #147

Merged
merged 6 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions grafana/alerting-get-started-pt2/finish.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Summary

In this tutorial, you have learned how Grafana Alerting can route individual alert instances using the labels generated by the data-source query and match these labels with notification policies, which in turn routes alert notifications to specific contact points.

If you run into any problems, you are welcome to post questions in our [Grafana Community forum](https://community.grafana.com/).

Enjoy your monitoring!
41 changes: 41 additions & 0 deletions grafana/alerting-get-started-pt2/index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"title": "Get started with Grafana Alerting - Part 2",
"description": "Learn to use alert instances and route notifications by labels to contacts, building on your alerting skills in Grafana for more advanced workflows — Part 2.",
"details": {
"intro": {
"text": "intro.md"
},
"steps": [
{
"text": "step1.md"
},
{
"text": "step2.md"
},
{
"text": "step3.md"
},
{
"text": "step4.md"
},
{
"text": "step5.md"
},
{
"text": "step6.md"
},
{
"text": "step7.md"
},
{
"text": "step8.md"
}
],
"finish": {
"text": "finish.md"
}
},
"backend": {
"imageid": "ubuntu"
}
}
15 changes: 15 additions & 0 deletions grafana/alerting-get-started-pt2/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Get started with Grafana Alerting - Part 2

The Get started with Grafana Alerting tutorial Part 2 is a continuation of [Get started with Grafana Alerting tutorial Part 1](http://www.grafana.com/tutorials/alerting-get-started/).

In this guide, we dig into more complex yet equally fundamental elements of Grafana Alerting: **alert instances** and **notification policies**.

After introducing each component, you will learn how to:

- Configure an alert rule that returns more than one alert instance

- Create notification policies that route firing alert instances to different contact points

- Use labels to match alert instances and notification policies

Learning about alert instances and notification policies is useful if you have more than one contact point in your organization, or if your alert rule returns a number of metrics that you want to handle separately by routing each alert instance to a specific contact point. The tutorial will introduce each concept, followed by how to apply both concepts in a real-world scenario.
310 changes: 310 additions & 0 deletions grafana/alerting-get-started-pt2/preprocessed.md

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions grafana/alerting-get-started-pt2/step1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
To demonstrate the observation of data using the Grafana stack, download and run the following files.

1. Clone the [tutorial environment repository](https://www.github.com/grafana/tutorial-environment).

```
git clone https://github.com/grafana/tutorial-environment.git
```{{exec}}

1. Change to the directory where you cloned the repository:

```
cd tutorial-environment
```{{exec}}

1. Run the Grafana stack:

```bash
docker-compose up -d
```{{exec}}

The first time you run `docker compose up -d`{{copy}}, Docker downloads all the necessary resources for the tutorial. This might take a few minutes, depending on your internet connection.

NOTE:

If you already have Grafana, Loki, or Prometheus running on your system, you might see errors, because the Docker image is trying to use ports that your local installations are already using. If this is the case, stop the services, then run the command again.
11 changes: 11 additions & 0 deletions grafana/alerting-get-started-pt2/step2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Alert instances

An [alert instance](https://grafana.com/docs/grafana/latest/alerting/fundamentals/#alert-instances) is an event that matches a metric returned by an alert rule query.

Let’s consider a scenario where you’re monitoring website traffic using Grafana. You’ve set up an alert rule to trigger an alert instance if the number of page views exceeds a certain threshold (more than `1000`{{copy}} page views) within a specific time period, say, over the past `5`{{copy}} minutes.

If the query returns more than one time-series, each time-series represents a different metric or aspect being monitored. In this case, the alert rule is applied individually to each time-series.

![Screenshot displaying alert instances in the context of an alert rule, highlighting the specific alerts triggered by the rule and their respective statuses](https://grafana.com/media/docs/alerting/alert-instance-flow.jpg)

In this scenario, each time-series is evaluated independently against the alert rule. It results in the creation of an alert instance for each time-series. The time-series corresponding to the desktop page views meets the threshold and, therefore, results in an alert instance in **Firing** state for which an alert notification is sent. The mobile alert instance state remains **Normal**.
13 changes: 13 additions & 0 deletions grafana/alerting-get-started-pt2/step3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Notification policies

[Notification policies](https://grafana.com/docs/grafana/latest/alerting/fundamentals/notifications/notification-policies/) route alerts to different communication channels, reducing alert noise and providing control over when and how alerts are sent. For example, you might use notification policies to ensure that critical alerts about server downtime are sent immediately to the on-call engineer. Another use case could be routing performance alerts to the development team for review and action.

Key Characteristics:

- Route alert notifications by matching alerts and policies with labels

- Manage when to send notifications

![Screenshot illustrating the routing of alerts with notification policies, including the configuration and flow of alerts through different notification channels](https://grafana.com/media/docs/alerting/get-started-notification-policy-tree-combo.png)

In the above diagram, alert instances and notification policies are matched by labels. For instance, the label `team=operations`{{copy}} matches the alert instance “**Pod stuck in CrashLoop**” and “**Disk Usage -80%**” to child policies that send alert notifications to a particular contact point (<[email protected]>).
21 changes: 21 additions & 0 deletions grafana/alerting-get-started-pt2/step4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Create notification policies

Create a notification policy if you want to handle metrics returned by alert rules separately by routing each alert instance to a specific contact point. In Grafana, click on the icon at the top left corner of the screen to access the navigation menu.

1. Visit [http://localhost:3000]({{TRAFFIC_HOST1_3000}}), where Grafana should be running

1. Navigate to **Alerts & IRM > Alerting > Notification policies**.

1. In the Default policy, click **+ New child policy**.

1. In the field **Label** enter `device`{{copy}}, and in the field **Value** enter `desktop`{{copy}}.

1. From the **Contact point** drop-down, choose **Webhook**.

If you don’t have any contact points, add a [Contact point](https://grafana.com/tutorials/alerting-get-started/#create-a-contact-point).

1. Click **Save Policy**.

This new child policy routes alerts that match the label `device=desktop`{{copy}} to the Webhook contact point.

1. **Repeat the steps above to create a second child policy** to match another alert instance. For labels use: `device=mobile`{{copy}}. Use the Webhook integration for the contact point. Alternatively, experiment by using a different Webhook endpoint or a [different integration](https://grafana.com/docs/grafana/latest/alerting/configure-notifications/manage-contact-points/#list-of-supported-integrations).
17 changes: 17 additions & 0 deletions grafana/alerting-get-started-pt2/step5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Create an alert rule that returns alert instances

The alert rule that you are about to create is meant to monitor web traffic page views. The objective is to explore what an alert instance is and how to leverage routing individual alert instances by using label matchers and notification policies.

## Add a data source

Grafana includes a [test data source](https://grafana.com/docs/grafana/latest/datasources/testdata/) that creates simulated time series data.

1. In Grafana navigate to **Connections > Add new connection**.

1. Search for **TestData**.

1. Click **Add new data source**.

1. Click **Save & test**.

You should see a message confirming that the data source is working.
39 changes: 39 additions & 0 deletions grafana/alerting-get-started-pt2/step6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Create an alert rule

1. Navigate to **Alerting > Alert rules**.

1. Click **New alert rule**.

# Enter an alert rule name

Make it short and descriptive as this will appear in your alert notification. For instance, `web-traffic`{{copy}}.

# Define query and alert condition

In this section, we use the default options for Grafana-managed alert rule creation. The default options let us define the query, a expression (used to manipulate the data – the `WHEN`{{copy}} field in the UI), and the condition that must be met for the alert to be triggered (in default mode is the threshold).

1. Select **TestData** data source from the drop-down menu.

1. From **Scenario** select **CSV Content**.

1. In the Query editor, switch to **Code** mode by clicking the button on the right.

1. Copy in the following CSV data:

```
device,views
desktop,1200
mobile,900
```{{copy}}

The above CSV data simulates a data source returning multiple time series, each leading to the creation of an alert instance for that specific time series. Note that the data returned matches the example in the [Alert instance](https://grafana.com#alert-instances) section.

1. In the **Alert condition** section:

- Keep `Last`{{copy}} as the value for the reducer function (`WHEN`{{copy}}), and `1000`{{copy}} as the threshold value. This is the value above which the alert rule should trigger.

1. Click **Preview** to run the queries.

It should return two series.`desktop`{{copy}} in Firing state, and `mobile`{{copy}} in Normal state. The values `1`{{copy}}, and `0`{{copy}} mean that the condition is either `true`{{copy}} or `false`{{copy}}.

![Screenshot showing a preview of a query in Grafana that returns two alert instances, including the query results and relevant alert details](https://grafana.com/media/docs/alerting/firing-instances.png)
31 changes: 31 additions & 0 deletions grafana/alerting-get-started-pt2/step7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Set evaluation behavior

In the [life cycle](http://grafana.com/docs/grafana/next/alerting/fundamentals/alert-rule-evaluation/) of alert instances, when an alert condition (threshold) is not met, the alert instance state is **Normal**. Similarly, when the condition is breached (for longer than the pending period, which in this tutorial will be 0), the alert instance state switches back to **Alerting**, which means that the alert rule state is **Firing**, and a notification is sent.

To set up evaluation behavior:

1. In **Folder**, click **+ New folder** and enter a name. For example: `web-traffic-alerts`{{copy}}. This folder will contain our alerts.

1. In the **Evaluation group**, repeat the above step to create a new evaluation group. We will name it `1m`{{copy}} (referring to “1 minute”).

1. Choose an Evaluation interval (how often the alert will be evaluated). Choose `1m`{{copy}}.

1. Set the pending period to `0s`{{copy}} (zero seconds), so the alert rule fires the moment the condition is met.

# Configure labels and notifications

In this section, you can select how you want to route your alert instances. Since we want to route by notification policy, we need to ensure that the labels match the alert instance.

1. Choose **Use notification policy**.

1. Click **Preview routing**. Based on the existing labels, you should see a preview of what policies are matching with the alerts. There should be two alert instances matching the labels that were previously setup in each notification policy: `device=desktop`{{copy}}, `device=mobile`{{copy}}.

These [types of labels](https://grafana.com/docs/grafana/latest/alerting/fundamentals/alert-rules/annotation-label/#label-types) are generated by the data source query and they can be leveraged to match our notification policies without needing to manually add them to the alert rule.

![Screenshot showing a routing preview of matched notification policies, detailing how alerts are matched and routed to specific notification channels](https://grafana.com/media/docs/alerting/get-started-alert-instace-routing-prev.png)

Even if both labels match the policies, only the alert instance in Firing state produces an alert notification.

1. Click **Save rule and exit**.

Now that we have set up the alert rule, it’s time to check the alert notification.
9 changes: 9 additions & 0 deletions grafana/alerting-get-started-pt2/step8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Receive alert notifications

Now that the alert rule has been configured, you should receive alert [notifications](http://grafana.com/docs/grafana/next/alerting/fundamentals/alert-rule-evaluation/state-and-health/#notifications) in the contact point whenever the alert triggers and gets resolved. In our example, each alert instance should be routed separately as we configured labels to match notification policies. Once the evaluation interval has concluded (1m), you should receive an alert notification in the Webhook endpoint.

![Screenshot showing the exploration of alert notification details in a webhook endpoint, displaying the content and structure of the alert payload received by the endpoint](https://grafana.com/media/docs/alerting/get-started-webhook-alert-isntance.png)

The alert notification details show that the alert instance corresponding to the website views from desktop devices was correctly routed through the notification policy to the Webhook contact point. The notification also shows that the instance is in **Firing** state, as well as it includes the label `device=desktop`{{copy}}, which makes the routing of the alert instance possible.

Feel free to change the CSV data in the alert rule to trigger the routing of the alert instance that matches the label `device=mobile`{{copy}}.
1 change: 1 addition & 0 deletions grafana/structure.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"items": [
{ "path": "grafana-basics", "title": "Grafana Basics"},
{ "path": "alerting-get-started", "title": "Get started with Grafana Alerting"},
{ "path": "alerting-get-started-pt2", "title": "Get started with Grafana Alerting - Part 2"},
{ "path": "alerting-loki-logs", "title": "Create alert rules with logs"},
{ "path": "grafana-fundamentals", "title": "Grafana Fundamentals"},
{ "path": "fo11y", "title": "Frontend Observability"}
Expand Down
Loading