Sync dependencies to Redis #10290

yhabteab · 2025-01-13T08:31:02Z

SSIA 🤪! Just kidding! This PR synchronises all the required points from Icinga/icingadb#347 (comment) to Redis. However, I'm not going to explain every implementation detail of the PR here, but it can be roughly understood as follows.

0af4679 and fe8169a syncs the host/service.affected_children and state.affects_children attributes to Redis as described in Track effect of an object on dependent children #10158.
19862bd Adds the no_user_modify flag to the redundancy_group attribute in the dependency.ti file to prevent any runtime alteration of its value, as there is now too much logic and functionality depending on this value and changing it at runtime would have a disastrous end.
9f4655b Introduces a new auxiliary class DependencyGroup to easily group and manage identical dependencies of any checkable in one place. Yes, this is also used to group non-redundant dependencies, but such a group is entirely unique for each checkable and is never referenced by other checkables.
6638f36 Introduces a global shared registry for the dependency groups. In addition, here is also where all the dependency deduplication magic happens.
f478e5b Introduces an auxiliary method for determining the state of any DependencyGroup at any given time as described in Let redundancy groups not just fail #10190.
49e6ae2 and 2ea7e59 Dumps all dependency-related information to Redis at Icinga 2 startup/reload.
1fa8840 Same as the above two commits but only for dependencies runtime state updates.
e75d884 Processes runtime removed/created dependency objects.
ae269a8 Removes the obsolete failedDependency parameter of the Checkable::IsReachable() method. It is obsolete because none of the callers make use of it, it just adds unnecessary complexity to the method for no reason.
90f20b8 Changes the implementation of the Checkable::IsReachable() method and utilises the DependencyGroup::GetState() method introduced above.
6e7157e In order to handle runtime removed/created dependencies without additional overhead, the activation_priority of the Dependency object is set to -10 (just like for downtime objects). This way, the dependency objects will always get activated before their child/parent Checkables.

fixes #10158
fixes #10190
fixes #10227
fixes #10014
fixes #10143

julianbrost

I'm somewhat confused by the DependencyGroup class as it doesn't really map to the mental model I had from our discussions on that topic.

So in my understanding, a DependencyGroup would represent a set a set of checkables that are used as parents in dependency config objects combined with the attributes from that dependency object that affect how the availability of that dependency is determined, i.e. ignore_soft_states, period, and states. For dependencies without a redundancy group, that information is all that's needed to determine if all dependency objects that share the parent and these attribute mark all their children as unreachable. With a redundancy group, you have to look at all the parents from the redundancy group with the three aforementioned additional attributes. So that would be how to determine what creates a DependencyGroup object for redundancy groups.

For dependencies without a redundancy group, this grouping provides no value in itself, the dependency objects can be considered individually. There are two reasons why we might instantiate such trivial groups explicitly nonetheless: for one, it may allow simpler code by being able to treat both cases consistently, but more importantly, there was a request from Johannes that if two children depend on the same parent in such a way that the state of these dependencies is always the same (i.e. the three aforementioned attributes are identical), then the different graph edges should refer to the same shared state. These groups may be used for this deduplication as well.

Consider the following example (P = parent checkable, RG = redundancy group as represented in the generated graph, C = child checkable):

graph BT;
p1((P1));
p2((P2));
p3((P3));
c1((C1));
c2((C2));
c3((C3));
c4((C4));
c5((C5));
rg1(RG1);
c1-->rg1;
c2-->rg1;
c3-->rg1;
rg1-->p1;
rg1-->p2;
c4-->p3;
c5-->p3;

Here I'd expect the creation of the following two DependencyGroups (... refers to the three magic attributes attached to the parent in the corresponding dependency objects):

{(P1, ...), (P2, ...)}: This basically represents RG1
{(P3, ...)}: This is a if there was an imaginary second redundancy with only one parent, P3.

lib/icingadb/icingadb-objects.cpp

lib/icinga/dependency.cpp

lib/icinga/dependency.hpp

lib/icinga/dependency.cpp

lib/icinga/dependency.hpp

This does not work in this state! Trying to refresh Dependency if a Host or Service being member of this Dependency has a state change.

lib/icinga/checkable-dependency.cpp

lib/icinga/dependency.hpp

lib/icinga/checkable-dependency.cpp

The previous limit (32) doesn't seem to make sense, and appears to be some random number. So, this limit is set to 256 to match the limit in IsReachable().

lib/icinga/checkable-dependency.cpp

julianbrost

I've written a few inline comments regarding the term "member" but stopped at some point. Can you please go over all newly introduced things named that way and consider making the naming a bit more clear by changing that towards something like children/dependencies/parents?

julianbrost · 2025-02-10T16:03:24Z

lib/icingadb/icingadb-objects.cpp

+			edgeFromNodeId = redundancyGroupId;
+
+			// Sync redundancy group information only once unless it's a runtime update.
+			if (runtimeUpdates || m_DumpedGlobals.DependencyGroup.IsNew(redundancyGroupId)) {
+				syncSharedEdgeState = true;
+
+				Dictionary::Ptr groupData(new Dictionary{
+					{"environment_id", m_EnvironmentId},
+					{"display_name", dependencyGroup->GetName()},
+				});
+
+				hmsetRedundancyGroups.emplace_back(redundancyGroupId);
+				hmsetRedundancyGroups.emplace_back(JsonEncode(groupData));
+
+				Dictionary::Ptr nodeData(new Dictionary{
+					{"environment_id", m_EnvironmentId},
+					{"redundancy_group_id", redundancyGroupId},
+				});
+
+				hmsetDependencyNodes.emplace_back(redundancyGroupId);
+				hmsetDependencyNodes.emplace_back(JsonEncode(nodeData));
+
+				if (runtimeUpdates) {
+					AddObjectDataToRuntimeUpdates(*runtimeUpdates, redundancyGroupId, m_PrefixConfigObject + "redundancygroup", groupData);
+					AddObjectDataToRuntimeUpdates(*runtimeUpdates, redundancyGroupId, m_PrefixConfigObject + "dependency:node", nodeData);
+				}
+
+				auto stateAttrs(SerializeRedundancyGroupState(dependencyGroup));
+				hmsetRedundancyGroupsStates.emplace_back(redundancyGroupId);
+				hmsetRedundancyGroupsStates.emplace_back(JsonEncode(stateAttrs));
+
+				hmsetDependenciesStates.emplace_back(redundancyGroupId);
+				hmsetDependenciesStates.emplace_back(JsonEncode(new Dictionary{
+					{"id", redundancyGroupId},
+					{"environment_id", m_EnvironmentId},
+					{"failed", stateAttrs->Get("failed")},
+				}));
+			}
+
+			Dictionary::Ptr data(new Dictionary{
+				{"environment_id", m_EnvironmentId},
+				{"from_node_id", checkableId},
+				{"to_node_id", redundancyGroupId},
+				// All redundancy group members share the same state, thus use the group ID as a reference.
+				{"dependency_edge_state_id", redundancyGroupId},
+				{"display_name", dependencyGroup->GetName()},
+			});
+
+			auto edgeId(HashValue(new Array{checkableId, redundancyGroupId}));
+			hmsetDependencyEdges.emplace_back(edgeId);
+			hmsetDependencyEdges.emplace_back(JsonEncode(data));
+
+			if (runtimeUpdates) {
+				AddObjectDataToRuntimeUpdates(*runtimeUpdates, edgeId, m_PrefixConfigObject + "dependency:edge", data);
+			}
+		}
+
+		auto members(dependencyGroup->GetMembers(checkable.get()));
+		for (auto it(members.begin()); it != members.end(); /* no increment */) {
+			auto dependency(*it);
+			auto parent(dependency->GetParent());
+			auto displayName(dependency->GetShortName());
+
+			Dictionary::Ptr memberStateAttrs(SerializeDependencyEdgeState(dependencyGroup, dependency));
+
+			// All dependency objects that share the same parent Checkable are placed next to each other in the
+			// container. Thus, "it" will either point to the next dependency with a different parent or to the
+			// end of the container after the below loop. Typically, this case isn't that common to happen in
+			// production, and if it indeed does, then it's probably not intended by the user but Icinga 2 itself
+			// accepts such config anyway, and we should do the same, i.e., merge them into a single edge.
+			while (++it != members.end() && (*it)->GetParent() == parent) {
+				displayName += ", " + (*it)->GetShortName();
+				if (syncSharedEdgeState && memberStateAttrs->Get("failed") == false) {
+					memberStateAttrs = SerializeDependencyEdgeState(dependencyGroup, *it);
+				}
+			}
+
+			Dictionary::Ptr data(new Dictionary{
+				{"environment_id", m_EnvironmentId},
+				{"from_node_id", edgeFromNodeId},
+				{"to_node_id", GetObjectIdentifier(parent)},
+				{"dependency_edge_state_id", memberStateAttrs->Get("id")},
+				{"display_name", std::move(displayName)},
+			});
+
+			auto edgeId(HashValue(new Array{data->Get("from_node_id"), data->Get("to_node_id")}));
+			hmsetDependencyEdges.emplace_back(edgeId);
+			hmsetDependencyEdges.emplace_back(JsonEncode(data));
+
+			if (runtimeUpdates) {
+				AddObjectDataToRuntimeUpdates(*runtimeUpdates, edgeId, m_PrefixConfigObject + "dependency:edge", data);
+			}
+
+			if (syncSharedEdgeState) {
+				hmsetDependenciesStates.emplace_back(memberStateAttrs->Get("id"));
+				hmsetDependenciesStates.emplace_back(JsonEncode(memberStateAttrs));
+			}
+		}
+	}
+}


That function is very long. I'm unsure how feasible it would be to split if into separate functions given the overall code structure in this file. However, I'd add least add some comments that guide through the overall structure of the function, like saying that a code block inserts the graph node for the child checkable, the next one inserts the dependency group and then its graph node, then the edges from this to that node and so on.

julianbrost · 2025-02-10T16:13:46Z

lib/icingadb/icingadb-objects.cpp

@@ -1175,6 +1368,82 @@ void IcingaDB::UpdateState(const Checkable::Ptr& checkable, StateUpdate mode)

 		m_Rcon->FireAndForgetQuery(std::move(streamadd), Prio::RuntimeStateStream, {0, 1});
 	}
+
+	UpdateDependenciesState(checkable);


Why doesn't this call take mode into account? Looks like UpdateDependenciesState() sends only runtime updates, so:

Shouldn't it be done only in if (mode & StateUpdate::RuntimeOnly)?

Where do the corresponding volatile state updates happen?

Where do the corresponding volatile state updates happen?

According to my tests, it looks like they don't happen at all indeed. If a dependency state changes, the corresponding row in dependency_edge_state (SQL) updates, but if I select the corresponding value from icinga:dependency:edge:state Redis, this seems to be outdated.

julianbrost · 2025-02-11T08:47:13Z

lib/icinga/checkable-check.cpp

-	if ((stateChange || hardChange) && !children.empty())
+	if ((stateChange || hardChange) && !children.empty() && (IsStateOK(new_state) || AffectsChildren()))


I don't think that this change is correct: if you have a dependency that's only considered failed on critical services and a service improves from critical to warning, wouldn't this swallow the change?

Overseen the 🐘!

lib/icinga/checkable-dependency.cpp

lib/icinga/dependency-group.cpp

lib/icinga/dependency.hpp

lib/icinga/dependency-group.cpp

julianbrost · 2025-02-11T10:51:21Z

lib/icinga/dependency-group.cpp

+				state.OK = dependency->IsAvailable(dt);
+				// If this is a redundancy group, and we have found one functional path, that's enough and we can return.
+				// Likewise, if this is a non-redundant dependency group, and we have found one non-functional path,
+				// we have to mark the group as failed and return.
+				if (state.OK == IsRedundancyGroup()) { // OK && IsRedundancyGroup() || !OK && !IsRedundancyGroup()
+					return state;
+				}


That seems to always consider the dependency. Shouldn't there also be a case that can skip a specific dependency when it's outside of its configures time period?

The previous implementation had no such special handling, so I won't do that here as well, but if in doubt, will fix it in a separate PR as discussed.

julianbrost · 2025-02-11T12:29:54Z

lib/icinga/dependency.hpp

+	struct State
+	{
+		bool Reachable; // Whether the dependency group is reachable.
+		bool OK; // Whether the dependency group is reachable and OK.
+	};


bool OK; // Whether the dependency group is reachable and OK.

So OK only contains a meaningful value if Reachable = true, i.e. there are just 3 logical values here? Seeing it like this now: wouldn't an enum with values Ok, Failed, Unreachable actually better represent this?

lib/icinga/dependency-group.cpp

Otherwise, it would require too much code changes to properly handle redundancy group runtime modification in Icinga DB for no real benefit.

Co-Authored-By: Yonas Habteab <[email protected]>

It's way efficient than accessing them through the dependency objects, plus we won't have any duplicates.

The new implementation just counts reachable and available parents and determines the overall result by comparing numbers, see inline comments for more information. This also fixes an issue in the previous implementation: if it didn't return early from the loop, it would just return the state of the last parent considered which may not actually represent the group state accurately.

julianbrost · 2025-02-14T14:23:20Z

lib/icinga/dependency-group.cpp

+	// contain more elements if there are duplicate dependency config objects between two checkables. In this case,
+	// all of them have to be reachable or available as they don't provide redundancy. Note that unreachable implies
+	// unavailable here as well as only reachable parents count towards the number of available parents.
+	return {reachable == members.size(), available == members.size()};


members.size() will no longer work here if you may check (and thus count) more than one dependency per entry.

julianbrost · 2025-02-14T14:26:10Z

lib/icinga/checkable-dependency.cpp

+/**
+ * Checks whether the last check result of this Checkable affects its child dependencies.
+ *
+ * A Checkable affects its child dependencies if it runs into a non-OK state and results in any of its child
+ * Checkables to become unreachable. Though, that unavailable dependency may not necessarily cause the child
+ * Checkable to be in unreachable state as it might have some other dependencies that are still reachable, instead
+ * it just indicates whether the edge/connection between this and the child Checkable is broken or not.
+ *
+ * @return bool - Returns true if the Checkable affects its child dependencies, otherwise false.
+ */
+bool Checkable::AffectsChildren(const CheckResult::Ptr& cr) const
+{
+	if (!cr || !IsReachable()) {
+		// If there is no check result, or the Checkable is not reachable, we can't safely determine whether
+		// the Checkable affects its child dependencies.
 		return false;
 	}


As you've already noticed yourself, this doesn't really uses cr (besides checking it for nullptr), so that probably doesn't work as intended.

julianbrost · 2025-02-14T15:52:31Z

Oh damn it, I've completely missed something so far: for Icinga DB, each row must be completely "described" by their ID (that is, if a value in any column changes, that yields a new ID and deletes the old row) or needs a checksum (see properties_checksum) columns in existing tables in the schema.

Previously the dependency state was evaluated by picking the first dependency object from the batched members. However, since the dependency `disable_{checks,notifications` attributes aren't taken into account when batching the members, the evaluated state may yield a wrong result for some Checkables due to some random dependency from other Checkable of that group that has the `disable_{checks,notifications` attrs set. This commit forces the callers to always provide the child Checkable the state is evaluated for and picks only the dependency objects of that child Checkable.

…pendency Previously the cycle detection relied on the current Dependency object being registered in to the child's dependencies list. However, this requirement can easily be eliminated by directly pushing the child into the dependency graph stack before even starting the cycle detection. This way, we don't need to need to register the dependency via `AddDependency` before triggering the check and remove via `RemoveDependency()` if the check fails, anymore.

cla-bot bot added the cla/signed label Jan 13, 2025

yhabteab mentioned this pull request Jan 13, 2025

Checkable: Don't skip redundancy group checks for parent dependencies #10228

Closed

yhabteab force-pushed the icingadb-dependencies-sync branch from 95a27d3 to 17ba7c9 Compare January 15, 2025 16:28

julianbrost reviewed Jan 17, 2025

View reviewed changes

yhabteab force-pushed the icingadb-dependencies-sync branch from 17ba7c9 to a1175d1 Compare January 20, 2025 07:17

julianbrost reviewed Jan 20, 2025

View reviewed changes

lib/icinga/dependency.hpp Outdated Show resolved Hide resolved

yhabteab force-pushed the icingadb-dependencies-sync branch 2 times, most recently from 763d77c to bc82c04 Compare January 22, 2025 12:05

yhabteab mentioned this pull request Jan 24, 2025

Icinga DB state_type and is_acknowledged keys in redis don't match with what's inserted into the DB #9427

Open

IcingaDB: Start keeping track of Host/Service to Dependency relationship

9e0f750

This does not work in this state! Trying to refresh Dependency if a Host or Service being member of this Dependency has a state change.

yhabteab force-pushed the icingadb-dependencies-sync branch from bc82c04 to 2889822 Compare January 27, 2025 07:56

yhabteab added this to the 2.15.0 milestone Jan 27, 2025

yhabteab force-pushed the icingadb-dependencies-sync branch 2 times, most recently from 11c6498 to 78a0a29 Compare January 29, 2025 10:09

julianbrost reviewed Jan 30, 2025

View reviewed changes

yhabteab added 2 commits January 30, 2025 15:11

Checkable: Introduce GetAllChildrenCount() method

5fc78da

The previous limit (32) doesn't seem to make sense, and appears to be some random number. So, this limit is set to 256 to match the limit in IsReachable().

IcingaDB: Add affected_children to Host/Service Redis updates

66c6291

yhabteab force-pushed the icingadb-dependencies-sync branch from 4e5e2c8 to 9c678be Compare January 30, 2025 15:18

yhabteab requested a review from julianbrost January 30, 2025 15:24

julianbrost reviewed Jan 31, 2025

View reviewed changes

lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved

lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved

lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved

yhabteab force-pushed the icingadb-dependencies-sync branch 2 times, most recently from dc5da1b to 215057f Compare February 3, 2025 07:35

julianbrost reviewed Feb 3, 2025

View reviewed changes

lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved

lib/icinga/checkable-dependency.cpp Outdated Show resolved Hide resolved

yhabteab force-pushed the icingadb-dependencies-sync branch from 215057f to 065626d Compare February 6, 2025 12:04

yhabteab mentioned this pull request Feb 10, 2025

Sync dependencies Icinga/icingadb#889

Open

yhabteab force-pushed the icingadb-dependencies-sync branch from 546936c to 85dd4fa Compare February 10, 2025 10:40

julianbrost reviewed Feb 11, 2025

View reviewed changes

yhabteab added 2 commits February 11, 2025 17:33

IcingaDB: Sync affects_children as part of runtime state updates

c85939b

Dependency: Don't allow to change redundancy_group at runtime

a352e38

Otherwise, it would require too much code changes to properly handle redundancy group runtime modification in Icinga DB for no real benefit.

Checkable: Drop unused failedDependency argument from IsReachable()

2bb256b

yhabteab force-pushed the icingadb-dependencies-sync branch from 85dd4fa to aceaa9a Compare February 11, 2025 16:35

yhabteab and others added 18 commits February 12, 2025 09:06

Introduce DependencyGroup helper class

7965f8a

Add DependencyGroup::GetState() helper method

84ebd03

Checkable: Store dependencies grouped by their redundancy group

22a81bf

Checkable: Introduce HasAnyDependencies() method

c890a5b

Checkable: Use redundancy groups state in IsReachable

bf9fe46

tests: Add unittests for the redundancy groups registry

cb83c3b

IcingaDB: Bump expected redis version to 6

a69821f

IcingaDB: Dump checkables dependencies config to redis correctly

3c596ab

IcingaDB: Sync dependencies states to Redis

60841a6

IcingaDB: Sync dependencies initial states on config dump

58c2690

Simplify dependency group registration

76c6d5c

Co-Authored-By: Yonas Habteab <[email protected]>

Checkable: Delay dependency group global registration on startup

b355b24

Fix & adjust dependencies unittests

fa0ba76

Add basic unittests for bulk group registration

fa0318b

Checkable: Emit boost signals when changing dependency groups at runtime

098a424

Checkable: Extract parents directly from dependency groups

5b56433

Use GetParents() in FireSppressedNotifications()

7d7b4c1

It's way efficient than accessing them through the dependency objects, plus we won't have any duplicates.

IcingaDB: Process dependencies runtime updates

e46b25d

yhabteab force-pushed the icingadb-dependencies-sync branch from aceaa9a to e46b25d Compare February 12, 2025 08:06

yhabteab force-pushed the icingadb-dependencies-sync branch from 3578699 to e5852cc Compare February 14, 2025 12:22

julianbrost reviewed Feb 14, 2025

View reviewed changes

yhabteab added 4 commits February 14, 2025 17:02

Checkable: Drop superfluous cr param of AffectsChildren()

55d62dd

(Un)register dependencies from parent prior to child Checkable

ad2f995

yhabteab force-pushed the icingadb-dependencies-sync branch from 6f99bbd to ad2f995 Compare February 14, 2025 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync dependencies to Redis #10290

Sync dependencies to Redis #10290

yhabteab commented Jan 13, 2025 •

edited

Loading

julianbrost left a comment

julianbrost left a comment

julianbrost Feb 10, 2025

julianbrost Feb 10, 2025

julianbrost Feb 14, 2025

julianbrost Feb 11, 2025

yhabteab Feb 11, 2025

julianbrost Feb 11, 2025

yhabteab Feb 11, 2025

julianbrost Feb 11, 2025

julianbrost Feb 14, 2025

julianbrost Feb 14, 2025

julianbrost commented Feb 14, 2025

		if ((stateChange \|\| hardChange) && !children.empty())
		if ((stateChange \|\| hardChange) && !children.empty() && (IsStateOK(new_state) \|\| AffectsChildren()))

Sync dependencies to Redis #10290

Are you sure you want to change the base?

Sync dependencies to Redis #10290

Conversation

yhabteab commented Jan 13, 2025 • edited Loading

julianbrost left a comment

Choose a reason for hiding this comment

julianbrost left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julianbrost commented Feb 14, 2025

yhabteab commented Jan 13, 2025 •

edited

Loading