2686 add checkpoint with repoid #800

adrian-codecov · 2024-10-18T22:27:42Z

We're extending our checkpoint metrics to add repo specific metrics. The original approach considered adding a repo_id label to the existing checkpoint logger, but this is an ask for a specific repository. This approach isn't necessarily scalable with many repositories as repo_id has high cardinality, which isn't very well suited for prometheus. The better approach would be to leverage sql metrics and extend their functionality, but this approach was chosen for times sake.

This PR

Adds a new Repository based checkpoints that expect the repo_id
Added a checkpoint context variable to supply a typed context to metrics
- Adjusted files that instantiated a checkpoint class to provide said context
Removes statsd and tests

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

…oint-with-repoid

codecov-qa · 2024-10-18T22:37:31Z

Codecov Report

Attention: Patch coverage is 97.89474% with 2 lines in your changes missing coverage. Please review.

Project coverage is 97.98%. Comparing base (90ebfa7) to head (f3db2b5).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
helpers/checkpoint_logger/prometheus.py	95.00%	2 Missing ⚠️

@@            Coverage Diff             @@
##             main     #800      +/-   ##
==========================================
- Coverage   97.99%   97.98%   -0.01%     
==========================================
  Files         443      443              
  Lines       36513    36538      +25     
==========================================
+ Hits        35780    35803      +23     
- Misses        733      735       +2

Flag	Coverage Δ
integration	`97.98% <97.89%> (-0.01%)`	⬇️
unit	`97.98% <97.89%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.83% <97.18%> (-0.01%)`	⬇️
OutsideTasks	`97.97% <97.43%> (-0.01%)`	⬇️

Files with missing lines	Coverage Δ
helpers/checkpoint_logger/__init__.py	`94.85% <100.00%> (-0.15%)`	⬇️
helpers/tests/unit/test_checkpoint_logger.py	`99.61% <100.00%> (-0.01%)`	⬇️
rollouts/__init__.py	`100.00% <100.00%> (ø)`
tasks/notify.py	`95.18% <100.00%> (ø)`
tasks/notify_error.py	`100.00% <100.00%> (ø)`
tasks/upload.py	`96.15% <100.00%> (+0.01%)`	⬆️
tasks/upload_finisher.py	`97.73% <100.00%> (ø)`
helpers/checkpoint_logger/prometheus.py	`96.49% <95.00%> (-3.51%)`	⬇️

codecov-notifications · 2024-10-18T22:37:31Z

Codecov Report

Attention: Patch coverage is 97.89474% with 2 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
helpers/checkpoint_logger/prometheus.py	95.00%	2 Missing ⚠️

@@            Coverage Diff             @@
##             main     #800      +/-   ##
==========================================
- Coverage   97.99%   97.98%   -0.01%     
==========================================
  Files         443      443              
  Lines       36513    36538      +25     
==========================================
+ Hits        35780    35803      +23     
- Misses        733      735       +2

Flag	Coverage Δ
integration	`97.98% <97.89%> (-0.01%)`	⬇️
unit	`97.98% <97.89%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.83% <97.18%> (-0.01%)`	⬇️
OutsideTasks	`97.97% <97.43%> (-0.01%)`	⬇️

Files with missing lines	Coverage Δ
helpers/checkpoint_logger/__init__.py	`94.85% <100.00%> (-0.15%)`	⬇️
helpers/tests/unit/test_checkpoint_logger.py	`99.61% <100.00%> (-0.01%)`	⬇️
rollouts/__init__.py	`100.00% <100.00%> (ø)`
tasks/notify.py	`95.18% <100.00%> (ø)`
tasks/notify_error.py	`100.00% <100.00%> (ø)`
tasks/upload.py	`96.15% <100.00%> (+0.01%)`	⬆️
tasks/upload_finisher.py	`97.73% <100.00%> (ø)`
helpers/checkpoint_logger/prometheus.py	`96.49% <95.00%> (-3.51%)`	⬇️

codecov-public-qa · 2024-10-18T22:37:43Z

Codecov Report

Attention: Patch coverage is 97.89474% with 2 lines in your changes missing coverage. Please review.

Project coverage is 97.98%. Comparing base (90ebfa7) to head (f3db2b5).

✅ All tests successful. No failed tests found.

@@            Coverage Diff             @@
##             main     #800      +/-   ##
==========================================
- Coverage   97.99%   97.98%   -0.01%     
==========================================
  Files         443      443              
  Lines       36513    36538      +25     
==========================================
+ Hits        35780    35803      +23     
- Misses        733      735       +2

Flag	Coverage Δ
integration	`97.98% <97.89%> (-0.01%)`	⬇️
unit	`97.98% <97.89%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.83% <97.18%> (-0.01%)`	⬇️
OutsideTasks	`97.97% <97.43%> (-0.01%)`	⬇️

Files	Coverage Δ
helpers/checkpoint_logger/__init__.py	`94.85% <100.00%> (-0.15%)`	⬇️
helpers/tests/unit/test_checkpoint_logger.py	`99.61% <100.00%> (-0.01%)`	⬇️
rollouts/__init__.py	`100.00% <100.00%> (ø)`
tasks/notify.py	`95.18% <100.00%> (ø)`
tasks/notify_error.py	`100.00% <100.00%> (ø)`
tasks/upload.py	`96.15% <100.00%> (+0.01%)`	⬆️
tasks/upload_finisher.py	`97.73% <100.00%> (ø)`
helpers/checkpoint_logger/prometheus.py	`96.49% <95.00%> (-3.51%)`	⬇️

codecov · 2024-10-19T02:30:12Z

Codecov Report

Attention: Patch coverage is 97.89474% with 2 lines in your changes missing coverage. Please review.

Project coverage is 97.98%. Comparing base (90ebfa7) to head (f3db2b5).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
helpers/checkpoint_logger/prometheus.py	95.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #800      +/-   ##
==========================================
- Coverage   97.99%   97.98%   -0.01%     
==========================================
  Files         443      443              
  Lines       36513    36538      +25     
==========================================
+ Hits        35780    35803      +23     
- Misses        733      735       +2

Flag	Coverage Δ
integration	`97.98% <97.89%> (-0.01%)`	⬇️
unit	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
NonTestCode	`95.83% <97.18%> (-0.01%)`	⬇️
OutsideTasks	`97.97% <97.43%> (-0.01%)`	⬇️

Files with missing lines	Coverage Δ
helpers/checkpoint_logger/__init__.py	`94.85% <100.00%> (-0.15%)`	⬇️
helpers/tests/unit/test_checkpoint_logger.py	`99.61% <100.00%> (-0.01%)`	⬇️
rollouts/__init__.py	`100.00% <100.00%> (ø)`
tasks/notify.py	`95.18% <100.00%> (ø)`
tasks/notify_error.py	`100.00% <100.00%> (ø)`
tasks/upload.py	`96.15% <100.00%> (+0.01%)`	⬆️
tasks/upload_finisher.py	`97.73% <100.00%> (ø)`
helpers/checkpoint_logger/prometheus.py	`96.49% <95.00%> (-3.51%)`	⬇️

Swatinem · 2024-10-21T12:49:03Z

helpers/checkpoint_logger/__init__.py

-    def log_counters(obj: T) -> None:
-        metrics.incr(f"{klass.__name__}.events.{obj.name}")
-        PROMETHEUS_HANDLER.log_checkpoints(flow=klass.__name__, checkpoint=obj.name)
+    def log_counters(obj: T, context: CheckpointContext = None) -> None:


Suggested change

def log_counters(obj: T, context: CheckpointContext = None) -> None:

def log_counters(obj: T, context: CheckpointContext | None = None) -> None:

the same pattern repeats a bunch of times below. If the default is None, then None has to appear in the type as well.

Swatinem · 2024-10-21T12:50:28Z

helpers/checkpoint_logger/__init__.py

    ):
        self.cls = cls
        self.data = data if data else {}
        self.kwargs_key = _kwargs_key(self.cls)
        self.strict = strict
+        self.context = context if context else {}


This should just be context in this case, as defaulting to {} does not really make sense.

Swatinem · 2024-10-21T12:51:26Z

helpers/checkpoint_logger/prometheus.py

@@ -67,28 +126,46 @@ class PrometheusCheckpointLoggerHandler:
    methods in this class are mainly used by the CheckpointLogger class.
    """

-    def log_begun(self, flow: str):
+    def log_begun(self, flow: str, repo_id: int = None):


Suggested change

def log_begun(self, flow: str, repo_id: int = None):

def log_begun(self, flow: str, repo_id: int | None = None):

same here, the argument type needs to match up with the default value.

Swatinem · 2024-10-21T12:55:06Z

helpers/checkpoint_logger/__init__.py

+    kwargs: MutableMapping[str, Any],
+    strict: bool = False,
+    context: CheckpointContext = None,


for this particular case, you could just pick repo_id (aka repoid?) out of the kwargs.

I don't think that would work off the bat as the repoid has been specified as a keyword parameter, thus not belonging to **kwargs when it's supplied to the checkpoints functionality, for instance here https://github.com/codecov/worker/blob/5514b97f08b80b0615ea4122e1c3edab89dd3d45/tasks/upload.py#L300-L301. Unless you meant re-adding it to the kwargs object before we supply it. As an add-on, I created the context to a) type the supplied params and b) serve as an object that has items additional to the checkpoint flow - I'm open to a different approach, I chose this one for readability + separation of concerns. (Although I will rename repo_id to repoid to be consistent with our model definition)

Addressed your other suggestions, thanks 🙏

…oint-with-repoid

…point-with-repoid

matt-codecov · 2024-10-21T21:46:54Z

helpers/checkpoint_logger/__init__.py

+@dataclass
+class CheckpointContext:
+    repoid: int
+
+


thoughts on saving this in self.data["context"] and making it a TypedDict instead of a dataclass? if it's in self.data["context"] then you only have to check it once at the start of the flow and it'll be passed along like the rest of self.data automatically. i think TypedDict can be auto-serialized to JSON which is necessary to pass it in celery task arguments

i think the type of self.data would have to change to something like Optional[MutableMapping[T | str, int | CheckpointContext]]. a little messy.

matt-codecov · 2024-10-21T21:53:48Z

helpers/checkpoint_logger/__init__.py

+    cls: type[T],
+    kwargs: MutableMapping[str, Any],
+    strict: bool = False,
+    context: CheckpointContext | None = None,


if you take the TypedDict/self.data["context"] suggestion, you won't need to take this in from_kwargs. you will need to do a little extra work though

def from_kwargs(...): # Copy so we don't modify the passed-in kwargs data = kwargs.get(_kwargs_key(cls), {}).copy() deserialized_data = {} # Remove the "context" key. All remaining keys should be castable to checkpoints deserialized_data["context"] = data.pop("context", CheckpointContext()) # for loop can remain the same for checkpoints, timestamp in data.items(): ...

So I had to play with this a bit to make it work. The kwargs.get(_kwargs_key(cls), {}).copy() is an empty object the first time a flow is ran, so instead I did

def from_kwargs( cls: type[T], kwargs: MutableMapping[str, Any], strict: bool = False, ) -> CheckpointLogger[T]: context = kwargs.pop("context", CheckpointContext()) data = kwargs.get(_kwargs_key(cls), {}) # kwargs has been deserialized into a Python dictionary, but our enum values # are deserialized as simple strings. We need to ensure the strings are all # proper enum values as best we can, and then downcast to enum instances. deserialized_data = {} deserialized_data["context"] = context for checkpoint, timestamp in data.items():

it could be prettier but that's what I could think, wdyt?

helpers/checkpoint_logger/prometheus.py

matt-codecov · 2024-10-21T22:02:02Z

helpers/tests/unit/test_checkpoint_logger.py

+    @patch("helpers.checkpoint_logger.prometheus.CHECKPOINT_ENABLED_REPOSITORIES")
+    def test_reliability_counters_with_context(self, mock_object):
+        repoid = 123
+        mock_object.return_value = repoid


mock_object is the feature object, not its check_value() function. i wonder why it appears to work anyway?

helpers/tests/unit/test_checkpoint_logger.py

adrian-codecov added 4 commits October 18, 2024 15:22

Add repoid to prometheus checkpoint fns

e6d00e6

Merge branch 'main' of github.com:codecov/worker into 2686-add-checkp…

df23dd7

…oint-with-repoid

remove prints

1c45cd8

delete undesired line

2b92ba6

Swatinem reviewed Oct 21, 2024

View reviewed changes

adrian-codecov added 4 commits October 21, 2024 10:36

Merge branch 'main' of github.com:codecov/worker into 2686-add-checkp…

5514b97

…oint-with-repoid

Adjust repo_id to repoid + default values

3921a79

Add testst

d83f845

Mkerge branch 'main' of github.com:codecov/worker into 2686-add-check…

8d962d9

…point-with-repoid

matt-codecov requested changes Oct 21, 2024

View reviewed changes

adrian-codecov added 3 commits October 21, 2024 18:58

Refactor implementation to take context in data object

ff664e8

adjust test

19a15ba

get rid of prints

f3db2b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2686 add checkpoint with repoid #800

2686 add checkpoint with repoid #800

adrian-codecov commented Oct 18, 2024

codecov-qa bot commented Oct 18, 2024 •

edited

Loading

codecov-notifications bot commented Oct 18, 2024 •

edited

Loading

codecov-public-qa bot commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

Swatinem Oct 21, 2024

Swatinem Oct 21, 2024

Swatinem Oct 21, 2024

Swatinem Oct 21, 2024

adrian-codecov Oct 21, 2024

adrian-codecov Oct 21, 2024

matt-codecov Oct 21, 2024

matt-codecov Oct 21, 2024

adrian-codecov Oct 22, 2024

matt-codecov Oct 21, 2024

	def log_counters(obj: T, context: CheckpointContext = None) -> None:
	def log_counters(obj: T, context: CheckpointContext \| None = None) -> None:

	def log_begun(self, flow: str, repo_id: int = None):
	def log_begun(self, flow: str, repo_id: int \| None = None):

2686 add checkpoint with repoid #800

Are you sure you want to change the base?

2686 add checkpoint with repoid #800

Conversation

adrian-codecov commented Oct 18, 2024

Legal Boilerplate

codecov-qa bot commented Oct 18, 2024 • edited Loading

Codecov Report

codecov-notifications bot commented Oct 18, 2024 • edited Loading

Codecov Report

codecov-public-qa bot commented Oct 18, 2024 • edited Loading

Codecov Report

codecov bot commented Oct 19, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-qa bot commented Oct 18, 2024 •

edited

Loading

codecov-notifications bot commented Oct 18, 2024 •

edited

Loading

codecov-public-qa bot commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading