-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HTTP metrics for Connectors #6067
Conversation
apollo-federation/src/sources/connect/validation/graphql/strings.rs
Outdated
Show resolved
Hide resolved
6459d7c
to
6de78c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of config for instruments:
telemetry:
instrumentation:
instruments:
subgraph:
http.client.request.duration:
attributes:
subgraph.name: true
connector.source:
connector_source: name
subgraph.http.method: true
subgraph.graphql.operation.type: string
connector.url.template: true
specific.metrics.for.a.connector:
condition:
eq:
- service_kind: true
- connector
value:
connector_http_response_header: "x-ratelimit-remaining"
unit: count
type: histogram
description: "my desc"
We also have to think about testing other instrumentation elements like events and spans using the new selectors and attributes
@@ -264,6 +272,78 @@ impl DefaultForLevel for SubgraphAttributes { | |||
} | |||
} | |||
|
|||
#[derive(Deserialize, JsonSchema, Clone, Default, Debug)] | |||
#[serde(deny_unknown_fields, default)] | |||
pub(crate) struct ConnectorHttpAttributes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put this in subgraph attributes. Let's only keep subgraph attributes for now, and as it's namespaced by connector. It will be enough, I also think we shouldn't have both connector.http.method
and subgraph.http.method
we should just keep the subgraph one for now. If people wants to know if it's a connector or a subgraph I would suggest adding in SubgraphSelelector
something like service_kind
which could be connector
or subgraph
. (I don't think service_kind
is the right naming but you got the idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that the relationship between subgraph and connector is not 1-to-1. There can be a number of connectors on a given subgraph, and there can be multiple calls to each connector per GraphQL request. So a connector isn't really a "kind" of subgraph - the subgraph service lifecycle stage is not invoked for connectors. The lifecycle is really now like this:
------ subgraph service ------ http client service
|
fetch service --|
| ------- http client service
------ connector service ------|
------- http client service
The fetch service and connector service aren't currently exposed as plugin lifecycle hooks, but likely will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm indeed, I didn't know we could have several http calls for a connector. Any idea @BrynCooke ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's here is the way forward, and then we add http_client
service support to the telemetry plugin.
I know it's a big job, but it feels like the right thing to do.
@@ -81,6 +86,11 @@ pub(crate) struct InstrumentsConfig { | |||
SubgraphInstrumentsConfig, | |||
Instrument<SubgraphAttributes, SubgraphSelector, SubgraphValue>, | |||
>, | |||
/// Connector service instruments. For more information see documentation on Router lifecycle. | |||
pub(crate) connector: Extendable< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I think everything should be in subgraph for now. To keep only subgraph. Because I think usually people will want to have the same attributes even if it's a connector or subgraph (if not they would still be able to know using the selector and a condition to add specific attributes or metrics if it's a connector for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue here is that HTTP connectors are only one kind of connector - there will be more. So unlike subgraphs, where every request is HTTP, we may have connector requests that are SQL, gRPC, etc. As we go forward with these, sharing attributes between connectors and subgraph would make less and less sense, since the attributes would never apply to a subgraph.
@@ -3569,4 +3852,203 @@ mod test { | |||
Some("default".into()) | |||
); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need tests including both subgraph calls and connectors calls to make sure it works properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test with both connectors and subgraph, let me know if it's sufficient.
cc @BrynCooke I need your review here too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this is looking good. Can we pull the connectors stuff out into a separate module though rather than adding to the already large attributes.rs
instruments.rs
and selectors.rs
We should eventually be aiming for a separate module for each router pipeline stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just needs more tests for instruments but also for events and spans, feel free to ask if you need help or if you don't find these existing tests
const TEST_SUBGRAPH_NAME: &str = "test_subgraph_name"; | ||
const TEST_SOURCE_NAME: &str = "test_source_name"; | ||
const TEST_URL_TEMPLATE: &str = "/test"; | ||
const TEST_HEADER_NAME: &str = "test_header_name"; | ||
const TEST_HEADER_VALUE: &str = "test_header_value"; | ||
const TEST_STATIC: &str = "test_static"; | ||
|
||
#[fixture] | ||
fn connector_info() -> ConnectorInfo { | ||
ConnectorInfo { | ||
subgraph_name: TEST_SUBGRAPH_NAME.to_string(), | ||
source_name: Some(TEST_SOURCE_NAME.to_string()), | ||
http_method: HTTPMethod::Get.as_str().to_string(), | ||
url_template: TEST_URL_TEMPLATE.to_string(), | ||
} | ||
} | ||
|
||
#[fixture] | ||
fn context(connector_info: ConnectorInfo) -> RouterContext { | ||
let context = RouterContext::default(); | ||
context | ||
.insert(CONNECTOR_INFO_CONTEXT_KEY, connector_info) | ||
.unwrap(); | ||
context | ||
} | ||
|
||
#[fixture] | ||
fn http_request(context: RouterContext) -> HttpRequest { | ||
HttpRequest { | ||
http_request: http::Request::builder().body("".into()).unwrap(), | ||
context, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel confident introducing this technique to debug. We try as much as we can to lower the compilation time and if we introduce macros in our tests it will increase the time. Could you try to use the same kind of tests we did before. If it's something you would like to introduce in the router codebase then I would suggest to talk about this during one of our router arch meeting because we need opinion from the team. Also we have yaml based tests in apollo-router/src/plugins/telemetry/config_new/fixtures
Could you add more tests directly there please ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel confident introducing this technique to debug. We try as much as we can to lower the compilation time and if we introduce macros in our tests it will increase the time. Could you try to use the same kind of tests we did before. If it's something you would like to introduce in the router codebase then I would suggest to talk about this during one of our router arch meeting because we need opinion from the team. Also we have yaml based tests in
apollo-router/src/plugins/telemetry/config_new/fixtures
Could you add more tests directly there please ?
I replaced the use of rstest
with just plain unit tests, and added more tests in fixtures
.
ae88b1b
to
970b75d
Compare
I've added support for connectors HTTP events, and added tests for that in the places I found existing tests for other events. Please let me know if I didn't find the tests you were looking for. Spans are another matter - I haven't added support for spans in this PR. The HTTP client service isn't really set up for this - it creates the @bnjjj - let me know if this sounds ok, and if the tests I added are what you were looking for |
@BrynCooke - I moved connectors related things out to a |
970b75d
to
0a6ba87
Compare
✅ Docs Preview ReadyNo new or changed pages found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, could you also open an issue explaining the connector selectors are not already supported for spans link it directly in this PR please ? Also could be great to document that limitation in selectors docs and spans docs probably.
apollo-router/src/plugins/telemetry/config_new/connector/http/events.rs
Outdated
Show resolved
Hide resolved
apollo-router/src/plugins/telemetry/config_new/connector/http/events.rs
Outdated
Show resolved
Hide resolved
docs/source/configuration/telemetry/instrumentation/selectors.mdx
Outdated
Show resolved
Hide resolved
docs/source/configuration/telemetry/instrumentation/selectors.mdx
Outdated
Show resolved
Hide resolved
3801f57
to
994d598
Compare
Hey @pubmodmatt I opened a PR #6125 to also add the ability to configure span attributes for connectors. It's wip and needs tests. I'll probably need your help to locally set up a supergraph working with connectors and also add integration tests. When writing this PR I noticed couple of things I didn't see when reviewing.
|
@bnjjj -
Use of
I'm open to suggestions on this. My concern is when we add a new connector for SQL, and presumably a new SQL client service with its own request and response types, for example. Say we have this; telemetry:
instrumentation:
events:
connector:
request: off
response: off That would turn off events for both HTTP and SQL. There would be no way to turn them on separately. Alternatively, this would give the ability to turn HTTP or SQL events on or off separately: telemetry:
instrumentation:
events:
connector:
http:
request: off
response: info
sql:
request: off
response: off |
Both of your arguments are correct. For modules I would suggest adding this topic for the next router arch meeting, by doing this we could decide together what's the best strategy to adopt. For the http section I was thinking about something a little bit different for your use case. That's already what we can do for specific GraphQL operation for example. But basically as you can set conditions on events I was thinking about using conditions to say something like if connector.kind is sql then display events. That's already what we do. What do you think ? |
Actually, this won't work for the same reasons I mentioned earlier - the |
Done. |
I agree about the implementation but I don't think our implementation issues should be reflected in the configuration so I think we should use |
Agreed that the configuration shouldn't expose these implementation issues. I think we can solve for the request and response types using enums - working on coding that up now. I'll look at |
@pubmodmatt One last thing, could you add metrics for connectors like we have for subgraph here https://github.com/apollographql/router/blob/pubmodmatt/connectors/instruments/apollo-router/src/configuration/metrics.rs#L350 please ? |
I think this could be a separate PR. |
Thanks @bnjjj - I didn't know about that metric. I'll create a new PR to add that. |
Add standard metrics for connectors HTTP requests and responses:
http.client.request.duration
http.client.request.body.size
http.client.response.body.size
Selectors are defined for:
subgraph_name
connector_source
connector_http_request_header
connector_http_response_header
connector_http_response_status
connector_http_method
connector_url_template
static
Attributes:
subgraph.name
connector.source.name
connector.http.method
connector.url.template
The implementation uses the HTTP client service, as a single call to the connector service can result in multiple HTTP calls. The metrics make more sense at the HTTP client level anyway. However, since the instrumentation types are generic over a single request/response type, this means we'll be unable to have any other instruments at the connector service level that use the same selectors. However, that should be fine as the interesting data is at the HTTP level for connectors.
Since the HTTP client service is not public, this required making the telemetry plugin private. This in turn required some small changes to the plugin test harness, which was previously incompatible with private plugins.
Examples
Add a metric for the number of 404 response codes from a particular connector source API:
Add connector attributes to the
http.client.request.duration
instrument:Create a histogram of the remaining rate limit from an API based on a response header value:
It's also now possible to configure events for connectors:
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩