Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persisted queries have client names #6198

Merged
merged 11 commits into from
Dec 6, 2024
Merged

persisted queries have client names #6198

merged 11 commits into from
Dec 6, 2024

Conversation

glasser
Copy link
Member

@glasser glasser commented Oct 25, 2024

(This depends on functionality of GraphOS that is not currently enabled for all users, as well as an unreleased version of Rover.)

The persisted query manifest fetched from Uplink can now contain a clientName field in each operation. Two operations with the same id but different clientName are considered to be distinct operations (and may have distinct bodies).

Router resolves the client name by taking the first of these which exists:

  • Reading the apollo_persisted_queries::client_name context key (which may be set by a router_service plugin)
  • Reading the HTTP header named by telemetry.apollo.client_name_header (which defaults to apollographql-client-name)

If a client name can be resolved for a request, Router first tries to find a persisted query with the specified ID and the resolved client name.

If there is no operation with that ID and client name, or if a client name cannot be resolved, Router tries to find a persisted query with the specified ID and no client name specified. (This means that existing PQ lists that do not contain client names will continue to work.)

This PR also improves the internal "samples" integration test suite:

  • Tests can be specified in YAML in addition to JSON
  • Unknown fields in test plans lead to test failure instead of being ignored
  • Test "Request" actions can have descriptions that are printed (to help find which step failed)

The existing PQ integration test is moved into the "samples" suite.

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Oct 25, 2024

✅ Docs Preview Ready

Configuration
{
  "repoOverrides": {
    "apollographql/router@main": {
      "remote": {
        "owner": "apollographql",
        "repo": "router",
        "branch": "glasser/pq-client-name"
      }
    }
  }
}
Name Link

Commit

54e79ca

Preview

https://www.apollographql.com/docs/deploy-preview/9f521e9e48ad13c0c2d9

Build ID

9f521e9e48ad13c0c2d9

File Changes

0 new, 9 changed, 0 removed
* graphos/reference/router/telemetry/instrumentation/standard-attributes.mdx
* graphos/reference/router/telemetry/instrumentation/standard-instruments.mdx
* graphos/reference/router/telemetry/instrumentation/selectors.mdx
* graphos/reference/router/telemetry/metrics-exporters/datadog.mdx
* graphos/reference/router/telemetry/trace-exporters/datadog.mdx
* graphos/reference/router/configuration.mdx
* graphos/routing/observability/client-awareness.mdx
* graphos/routing/security/persisted-queries.mdx
* graphos/routing/configure-your-router.mdx

Pages

9


9 pages published. Build will be available for 30 days.

This comment has been minimized.

@router-perf
Copy link

router-perf bot commented Oct 25, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

{
if let Some(persisted_query_body) = manifest_poller.get_operation_body(
persisted_query_id,
request.context.get(CLIENT_NAME).unwrap_or_default(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably implies that the new feature only works if Apollo telemetry is enabled. This is probably a reasonable constraint for an initial release, but we should at least document it.

.cloned()
{
Some(body)
} else if client_name.is_some() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this else if? It looks like we would be returning the null operation for the a request for an operation with a client_name. If client_name is Some and nothing exists in state.persisted_query_manifest for it, None feels like the right return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this is saying is — if we don't have an operation that is specifically designated for this client name, try one that was published without a client name. (If we didn't have this, all PQs would suddenly stop working for any clients that send a client name as soon as you upgraded to this version.)

The reason it's a conditional at all instead of just an "else try client_name: None" is because if client_name is None, it would be making the same exact call twice, which seems wasteful.

I should add comments to this function though!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooohhhh riiight! that totally makes sense! comment will definitely help, but logic is definitely good! thanks :)

@glasser glasser force-pushed the glasser/pq-client-name branch 2 times, most recently from e20b1d2 to 7348223 Compare November 26, 2024 20:42
hashmap! {"accounts".to_string() => 1}
);

// Successfully run a persisted query with client name that has its own operation,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to add an integration test that tests the "client name read from the header named by telemetry.apollo.client_name_header because I'm not sure if there's an easy way to enable the part of the telemetry code that reads the header without setting up a full reporting agent situation. (Maybe it would be OK for the PQ code to directly read this config field from the configuration instead? I had trouble getting that to work though.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other integration tests that actually spin up a router, see the apollo-router/tests directory. This enables a full testing of a feature including telemetry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the directory I'm in?

That said, this becomes trivial if I'm able to make the change mentioned above where the telemetry config is respected even when telemetry isn't enabled.

@@ -172,7 +172,7 @@ pub(crate) mod tracing;
pub(crate) mod utils;

// Tracing consts
const CLIENT_NAME: &str = "apollo_telemetry::client_name";
pub(crate) const CLIENT_NAME: &str = "apollo_telemetry::client_name";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR involves various pub and pub(crate) changes; this one is to make the actual logic work, and others are to make the integration tests work. I don't know if that's appropriate, just that it makes it build.

@glasser
Copy link
Member Author

glasser commented Nov 27, 2024

I think this is ready for an initial review. It still needs docs and a changeset entry, but it is code complete and has tests. I'd appreciate advice on how to write one particular test as mentioned in my comment.

request
.supergraph_request
.headers()
.get(client_name_header_default_str())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be right if the user has configured a custom client header name.
Telemetry puts the client name into the context. So why do we need to check for a header at all here?

Copy link
Member Author

@glasser glasser Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed reasonable to have the feature not be completely broken if telemetry isn't enabled (which, eg, was the case in the integration tests). Eg if you're running your router locally with your client that's built to sent PQs. That said I feel like I'd be happier if I could get the implementation to be "honor telemetry.apollo.client_name_header even if APOLLO_KEY isn't set" but I couldn't quite figure out a comfortable way to get access to the plugin configuration from outside of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I suppose another potential approach would be to refactor the telemetry plugin somehow so that it puts the client name in the context even if the actual reporting mechanism isn't enabled via API key.)

hashmap! {"accounts".to_string() => 1}
);

// Successfully run a persisted query with client name that has its own operation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other integration tests that actually spin up a router, see the apollo-router/tests directory. This enables a full testing of a feature including telemetry.

@glasser
Copy link
Member Author

glasser commented Nov 27, 2024

(oops, I'd been testing with cargo build and running my integration test and I missed that I broke the unit tests.)

@glasser glasser force-pushed the glasser/pq-client-name branch from 3941062 to 80f2c83 Compare November 27, 2024 20:24
assert_eq!(actual.data, None);
assert_eq!(registry.totals(), hashmap! {"accounts".to_string() => 1});

// We didn't break normal GETs.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular sub-test got lost because the samples files don't let me specify GET requests with checked responses.

@glasser
Copy link
Member Author

glasser commented Nov 27, 2024

@BrynCooke OK, this still lacks docs/changelog, but it's using a simpler form of testing that makes it more clear that the telemetry plugin is always loaded, so I was able to remove the apollographql-client-name fallback.

@glasser glasser force-pushed the glasser/pq-client-name branch from 784cc06 to f985c5e Compare November 28, 2024 05:50
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great with the new json based tests.
I still think unit testable parts should be unit tested though.

@abernix abernix changed the title [FOUN-575] persisted queries have client names persisted queries have client names Nov 29, 2024
@glasser glasser force-pushed the glasser/pq-client-name branch from f985c5e to 03a70d1 Compare December 3, 2024 02:10
Copy link
Member Author

@glasser glasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrynCooke OK, docs and changeset and unit tests done! I think this is ready for final consideration. Hoping it gets into v1.59.0! I'll coordinate the Rover and GraphOS-server-side changes.

@@ -138,6 +138,20 @@ To enable safelisting, you _must_ turn off [automatic persisted queries](/router

</Note>

### Customization via request context
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there may be a second entry in this section soon depending on how ROUTER-890 is implemented.

@glasser glasser marked this pull request as ready for review December 4, 2024 01:23
@glasser glasser requested review from a team as code owners December 4, 2024 01:23
@glasser glasser force-pushed the glasser/pq-client-name branch from 921fe00 to bf7ba2c Compare December 4, 2024 01:24
@glasser glasser force-pushed the glasser/pq-client-name branch from bf7ba2c to 8b459b7 Compare December 5, 2024 02:47
@glasser
Copy link
Member Author

glasser commented Dec 5, 2024

!docs preview main

This depends on unreleased functionality of GraphOS.
@glasser glasser force-pushed the glasser/pq-client-name branch 2 times, most recently from 0797d3e to 2bbe77c Compare December 5, 2024 22:14
@glasser
Copy link
Member Author

glasser commented Dec 5, 2024

!docs preview main

@glasser glasser force-pushed the glasser/pq-client-name branch from 2bbe77c to 54e79ca Compare December 5, 2024 22:40
@glasser
Copy link
Member Author

glasser commented Dec 5, 2024

!docs preview main

@BrynCooke BrynCooke merged commit eb2bca6 into dev Dec 6, 2024
13 checks passed
@BrynCooke BrynCooke deleted the glasser/pq-client-name branch December 6, 2024 16:29
@BrynCooke BrynCooke mentioned this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants