Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTTP metrics for Connectors #6067

Merged
merged 12 commits into from
Oct 15, 2024

Conversation

pubmodmatt
Copy link
Contributor

@pubmodmatt pubmodmatt commented Sep 26, 2024

Add standard metrics for connectors HTTP requests and responses:

  • http.client.request.duration
  • http.client.request.body.size
  • http.client.response.body.size

Selectors are defined for:

  • subgraph_name
  • connector_source
  • connector_http_request_header
  • connector_http_response_header
  • connector_http_response_status
  • connector_http_method
  • connector_url_template
  • static

Attributes:

  • subgraph.name
  • connector.source.name
  • connector.http.method
  • connector.url.template

The implementation uses the HTTP client service, as a single call to the connector service can result in multiple HTTP calls. The metrics make more sense at the HTTP client level anyway. However, since the instrumentation types are generic over a single request/response type, this means we'll be unable to have any other instruments at the connector service level that use the same selectors. However, that should be fine as the interesting data is at the HTTP level for connectors.

Since the HTTP client service is not public, this required making the telemetry plugin private. This in turn required some small changes to the plugin test harness, which was previously incompatible with private plugins.

Examples

Add a metric for the number of 404 response codes from a particular connector source API:

telemetry:
  instrumentation:
    instruments:
      connector:
        not.found.count:
          value: unit
          type: counter
          unit: count
          description: "Count of 404 responses from the user API"
          condition:
            all:
              - eq:
                - 404
                - connector_http_response_status: code
              - eq:
                - "user_api"
                - connector_source: name

Add connector attributes to the http.client.request.duration instrument:

http.client.request.duration:
  attributes:
    subgraph.name: true
    connector.source:
      connector_source: name
    connector.http.method: true
    connector.url.template: true

Create a histogram of the remaining rate limit from an API based on a response header value:

rate.limit:
  value:
    connector_http_response_header: "x-ratelimit-remaining"
  unit: count
  type: histogram
  description: "Rate limit remaining"
  condition:
    eq:
      - "user_api"
      - connector_source: name

It's also now possible to configure events for connectors:

telemetry:
  instrumentation:
    events:
      connector:
        request: off
        response: off
        error: error
        connector.request.event:
          message: "Connector request"
          level: info
          on: request
          attributes:
            http_method:
              connector_http_method: true
            url_template:
              connector_url_template: true
        connector.response.event:
          message: "Connector response"
          level: info
          on: response
          attributes:
            http_method:
              connector_http_method: true
            url_template:
              connector_url_template: true
            response_status:
              connector_http_response_status: code

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@pubmodmatt pubmodmatt self-assigned this Sep 26, 2024
@pubmodmatt pubmodmatt requested review from a team and nicholascioli as code owners September 26, 2024 15:45
@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from 6459d7c to 6de78c3 Compare September 26, 2024 20:31
Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of config for instruments:

telemetry:
  instrumentation:
    instruments:
      subgraph:
        http.client.request.duration:
          attributes:
            subgraph.name: true
            connector.source:
              connector_source: name
            subgraph.http.method: true
            subgraph.graphql.operation.type: string
            connector.url.template: true
        specific.metrics.for.a.connector:
          condition:
            eq:
            - service_kind: true
            - connector
          value:
            connector_http_response_header: "x-ratelimit-remaining"
          unit: count
          type: histogram
          description: "my desc"

We also have to think about testing other instrumentation elements like events and spans using the new selectors and attributes

@@ -264,6 +272,78 @@ impl DefaultForLevel for SubgraphAttributes {
}
}

#[derive(Deserialize, JsonSchema, Clone, Default, Debug)]
#[serde(deny_unknown_fields, default)]
pub(crate) struct ConnectorHttpAttributes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put this in subgraph attributes. Let's only keep subgraph attributes for now, and as it's namespaced by connector. It will be enough, I also think we shouldn't have both connector.http.method and subgraph.http.method we should just keep the subgraph one for now. If people wants to know if it's a connector or a subgraph I would suggest adding in SubgraphSelelector something like service_kind which could be connector or subgraph. (I don't think service_kind is the right naming but you got the idea).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that the relationship between subgraph and connector is not 1-to-1. There can be a number of connectors on a given subgraph, and there can be multiple calls to each connector per GraphQL request. So a connector isn't really a "kind" of subgraph - the subgraph service lifecycle stage is not invoked for connectors. The lifecycle is really now like this:

                 ------ subgraph service ------ http client service
                |
fetch service --|
                |                                ------- http client service
                 ------ connector service ------|
                                                 ------- http client service

The fetch service and connector service aren't currently exposed as plugin lifecycle hooks, but likely will be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm indeed, I didn't know we could have several http calls for a connector. Any idea @BrynCooke ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's here is the way forward, and then we add http_client service support to the telemetry plugin.

I know it's a big job, but it feels like the right thing to do.

@@ -81,6 +86,11 @@ pub(crate) struct InstrumentsConfig {
SubgraphInstrumentsConfig,
Instrument<SubgraphAttributes, SubgraphSelector, SubgraphValue>,
>,
/// Connector service instruments. For more information see documentation on Router lifecycle.
pub(crate) connector: Extendable<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think everything should be in subgraph for now. To keep only subgraph. Because I think usually people will want to have the same attributes even if it's a connector or subgraph (if not they would still be able to know using the selector and a condition to add specific attributes or metrics if it's a connector for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue here is that HTTP connectors are only one kind of connector - there will be more. So unlike subgraphs, where every request is HTTP, we may have connector requests that are SQL, gRPC, etc. As we go forward with these, sharing attributes between connectors and subgraph would make less and less sense, since the attributes would never apply to a subgraph.

@@ -3569,4 +3852,203 @@ mod test {
Some("default".into())
);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need tests including both subgraph calls and connectors calls to make sure it works properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test with both connectors and subgraph, let me know if it's sufficient.

apollo-router/src/plugins/telemetry/mod.rs Show resolved Hide resolved
@bnjjj
Copy link
Contributor

bnjjj commented Sep 27, 2024

cc @BrynCooke I need your review here too

Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this is looking good. Can we pull the connectors stuff out into a separate module though rather than adding to the already large attributes.rs instruments.rs and selectors.rs

We should eventually be aiming for a separate module for each router pipeline stage.

Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just needs more tests for instruments but also for events and spans, feel free to ask if you need help or if you don't find these existing tests

Comment on lines 3856 to 3888
const TEST_SUBGRAPH_NAME: &str = "test_subgraph_name";
const TEST_SOURCE_NAME: &str = "test_source_name";
const TEST_URL_TEMPLATE: &str = "/test";
const TEST_HEADER_NAME: &str = "test_header_name";
const TEST_HEADER_VALUE: &str = "test_header_value";
const TEST_STATIC: &str = "test_static";

#[fixture]
fn connector_info() -> ConnectorInfo {
ConnectorInfo {
subgraph_name: TEST_SUBGRAPH_NAME.to_string(),
source_name: Some(TEST_SOURCE_NAME.to_string()),
http_method: HTTPMethod::Get.as_str().to_string(),
url_template: TEST_URL_TEMPLATE.to_string(),
}
}

#[fixture]
fn context(connector_info: ConnectorInfo) -> RouterContext {
let context = RouterContext::default();
context
.insert(CONNECTOR_INFO_CONTEXT_KEY, connector_info)
.unwrap();
context
}

#[fixture]
fn http_request(context: RouterContext) -> HttpRequest {
HttpRequest {
http_request: http::Request::builder().body("".into()).unwrap(),
context,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel confident introducing this technique to debug. We try as much as we can to lower the compilation time and if we introduce macros in our tests it will increase the time. Could you try to use the same kind of tests we did before. If it's something you would like to introduce in the router codebase then I would suggest to talk about this during one of our router arch meeting because we need opinion from the team. Also we have yaml based tests in apollo-router/src/plugins/telemetry/config_new/fixtures Could you add more tests directly there please ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel confident introducing this technique to debug. We try as much as we can to lower the compilation time and if we introduce macros in our tests it will increase the time. Could you try to use the same kind of tests we did before. If it's something you would like to introduce in the router codebase then I would suggest to talk about this during one of our router arch meeting because we need opinion from the team. Also we have yaml based tests in apollo-router/src/plugins/telemetry/config_new/fixtures Could you add more tests directly there please ?

I replaced the use of rstest with just plain unit tests, and added more tests in fixtures.

apollo-router/src/plugins/telemetry/mod.rs Show resolved Hide resolved
@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from ae88b1b to 970b75d Compare October 1, 2024 23:51
@pubmodmatt
Copy link
Contributor Author

pubmodmatt commented Oct 1, 2024

It just needs more tests for instruments but also for events and spans, feel free to ask if you need help or if you don't find these existing tests

I've added support for connectors HTTP events, and added tests for that in the places I found existing tests for other events. Please let me know if I didn't find the tests you were looking for.

Spans are another matter - I haven't added support for spans in this PR. The HTTP client service isn't really set up for this - it creates the http_request span internally, so there's no way to add attributes to it at the plugin hook level. This is something we can look at adding in the future, but changes here could impact other uses of that service besides connectors.

@bnjjj - let me know if this sounds ok, and if the tests I added are what you were looking for

@pubmodmatt pubmodmatt requested review from BrynCooke and bnjjj October 1, 2024 23:56
@pubmodmatt
Copy link
Contributor Author

pubmodmatt commented Oct 2, 2024

In general this is looking good. Can we pull the connectors stuff out into a separate module though rather than adding to the already large attributes.rs instruments.rs and selectors.rs

We should eventually be aiming for a separate module for each router pipeline stage.

@BrynCooke - I moved connectors related things out to a connectors module, and there's an http module under that for these HTTP specific things. This required making some of the common types pub(super).

@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from 970b75d to 0a6ba87 Compare October 2, 2024 22:33
@pubmodmatt pubmodmatt requested a review from a team as a code owner October 3, 2024 19:46
@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Oct 3, 2024

✅ Docs Preview Ready

No new or changed pages found.

Copy link
Contributor

@bnjjj bnjjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, could you also open an issue explaining the connector selectors are not already supported for spans link it directly in this PR please ? Also could be great to document that limitation in selectors docs and spans docs probably.

@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/instruments branch from 3801f57 to 994d598 Compare October 4, 2024 19:59
@bnjjj
Copy link
Contributor

bnjjj commented Oct 7, 2024

Hey @pubmodmatt I opened a PR #6125 to also add the ability to configure span attributes for connectors. It's wip and needs tests. I'll probably need your help to locally set up a supergraph working with connectors and also add integration tests.

When writing this PR I noticed couple of things I didn't see when reviewing.

  • In order to be consistent with our current codebase I think the way you're handling rust modules should switch to the mod.rs model. You'll see the changes in my PR. If you disagree then I think that's something we should discuss as a team and keep this mechanism everywhere in order to be consistent and not create confusions.
  • Also for consistency I think we should not have http subsection in the connector instruments and connector events configuration as the attributes and metrics are already prefixed by http.* and that's how we did for other instruments and events.

@pubmodmatt
Copy link
Contributor Author

pubmodmatt commented Oct 7, 2024

@bnjjj -

  • In order to be consistent with our current codebase I think the way you're handling rust modules should switch to the mod.rs model. You'll see the changes in my PR. If you disagree then I think that's something we should discuss as a team
    and keep this mechanism everywhere in order to be consistent and not create confusions.

Use of mod.rs has been discouraged since Rust 1.3. That said, since the codebase is still using it, I can see the consistency argument. However, note that one of the reasons this is discouraged is for consistency. For example, the existing telemetry module has submodules config in telemetry/config.rs and config_new in telemetry/config_new/mod.rs. With the newer style, they'd be more consistently telemetry/config.rs and telemetry/config_new.rs.

  • Also for consistency I think we should not have http subsection in the connector instruments and connector events configuration as the attributes and metrics are already prefixed by http.* and that's how we did for other instruments and events.

I'm open to suggestions on this. My concern is when we add a new connector for SQL, and presumably a new SQL client service with its own request and response types, for example. Say we have this;

telemetry:
  instrumentation:
    events:
      connector:
        request: off
        response: off

That would turn off events for both HTTP and SQL. There would be no way to turn them on separately. Alternatively, this would give the ability to turn HTTP or SQL events on or off separately:

telemetry:
  instrumentation:
    events:
      connector:
        http:
          request: off
          response: info
        sql:
          request: off
          response: off

@bnjjj
Copy link
Contributor

bnjjj commented Oct 7, 2024

Both of your arguments are correct. For modules I would suggest adding this topic for the next router arch meeting, by doing this we could decide together what's the best strategy to adopt.

For the http section I was thinking about something a little bit different for your use case. That's already what we can do for specific GraphQL operation for example. But basically as you can set conditions on events I was thinking about using conditions to say something like if connector.kind is sql then display events. That's already what we do. What do you think ?

@pubmodmatt
Copy link
Contributor Author

pubmodmatt commented Oct 7, 2024

if connector.kind is sql then display events. That's already what we do. What do you think ?

That makes sense. I'll need to add connector type as an attribute, and for now it will just always be http.

Actually, this won't work for the same reasons I mentioned earlier - the StandardEventConfig and Condition types are generic on the Selector type, and the Selector type has associated types for the request and response types. So you need separate config for each kind of connector, as they have different request and response types.

@pubmodmatt
Copy link
Contributor Author

For modules I would suggest adding this topic for the next router arch meeting

Done.

@bnjjj
Copy link
Contributor

bnjjj commented Oct 8, 2024

Actually, this won't work for the same reasons I mentioned earlier - the StandardEventConfig and Condition types are generic on the Selector type, and the Selector type has associated types for the request and response types. So you need separate config for each kind of connector, as they have different request and response types.

I agree about the implementation but I don't think our implementation issues should be reflected in the configuration so I think we should use flatten from serde like we did here. I think it would solve the implementation. What do you think ?

@pubmodmatt
Copy link
Contributor Author

our implementation issues should be reflected in the configuration so I think we should use flatten from serde like we did here. I think it would solve the implementation. What do you think ?

Agreed that the configuration shouldn't expose these implementation issues. I think we can solve for the request and response types using enums - working on coding that up now. I'll look at flatten as well.

@bnjjj
Copy link
Contributor

bnjjj commented Oct 15, 2024

@pubmodmatt One last thing, could you add metrics for connectors like we have for subgraph here https://github.com/apollographql/router/blob/pubmodmatt/connectors/instruments/apollo-router/src/configuration/metrics.rs#L350 please ?

@BrynCooke
Copy link
Contributor

@pubmodmatt One last thing, could you add metrics for connectors like we have for subgraph here https://github.com/apollographql/router/blob/pubmodmatt/connectors/instruments/apollo-router/src/configuration/metrics.rs#L350 please ?

I think this could be a separate PR.

@pubmodmatt
Copy link
Contributor Author

One last thing, could you add metrics for connectors like we have for subgraph here

Thanks @bnjjj - I didn't know about that metric. I'll create a new PR to add that.

@pubmodmatt pubmodmatt merged commit ce71c17 into next Oct 15, 2024
11 of 13 checks passed
@pubmodmatt pubmodmatt deleted the pubmodmatt/connectors/instruments branch October 15, 2024 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants