Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

$request.headers and $response.headers for selection mapping in connectors #6638

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

andrewmcgivery
Copy link
Contributor

$request.headers and $response.headers for selection mapping in connectors

image


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Jan 23, 2025

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: ea92f30c1a446204a607f085

@router-perf
Copy link

router-perf bot commented Jan 23, 2025

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

Copy link
Contributor

@andrewmcgivery, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.

@andrewmcgivery andrewmcgivery marked this pull request as ready for review January 23, 2025 21:59
@andrewmcgivery andrewmcgivery requested review from a team as code owners January 23, 2025 21:59
@@ -239,6 +240,32 @@ impl JSONSelection {
.flat_map(|var_path| var_path.variable_reference())
.map(|var_ref| var_ref.namespace.namespace)
}

/// Get any headers referenced in the variable references by looking at both Request and Response namespaces.
pub fn header_references(&self) -> impl Iterator<Item = String> + '_ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json selection shouldn't know or care about headers, i think. is there another location for this code, preferably in or around an http module?

Copy link
Contributor Author

@andrewmcgivery andrewmcgivery Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, the transport doesn't contain the connector's JSON selection for the response, which is why I have to grab the variables from the connect.selection.

A possible alternative is I could maybe update external_variables() to generically return the variables instead of just the namespaces and apply the additional filtering/mapping for both response_variables and response_headers in Connector instead of in JSONSelection.

That way JSONSelection still exposes which variables are used in the selection but it's up to Connector to "convert" those to what it cares about.

Does that sound like a reasonable alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm trying to remember if there was a specific reason why we didn't expose the full external variables outside of this module. it might have been because they're made up of json selection components that aren't meant for use by other code? @benjamn might have some suggestions here.

i do think it's generally the right approach, but maybe there's an abstraction we want to add that doesn't expose json selection bits.

i was thinking that if we had the whole variable path, we could also do some logging on startup if a connector references $config.whatever and it's not present in the YAML!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this code to ../models.rs (where the other header_references logic lives). We can make anything pub(crate) that needs to be, to accommodate that move.

As long as Namespace::Request and Namespace::Response are defined, $request and $response should behave just like any other known variable.

apollo-router/src/plugins/connectors/make_requests.rs Outdated Show resolved Hide resolved
.headers
.iter()
.filter_map(|(key, value)| {
if headers_used.contains::<String>(&key.as_str().into()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

Suggested change
if headers_used.contains::<String>(&key.as_str().into()) {
if headers_used.contains(key.as_str()) {

Copy link
Member

@benjamn benjamn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for tackling this nontrivial improvement @andrewmcgivery!

While I support the general idea, I'm worried the choice to use object-based $request.headers and $response.headers maps assumes header names will be unique (see comments below), whereas HTTP headers can have multiple values per header.

Is there another representation we can use to expose the headers that takes this HTTP reality into account? Accessing a map where the values are always arrays gets a little less ergonomic (maybe a new ->header(name) method could help?), but I don't see another way to preserve multiple values separately, so I think we need something closer to a multimap than a key-value object.

We might be able to represent multiple keys by concatenating them with commas (and documenting that policy), but that needs to happen somewhere, instead of letting the last value win.

Comment on lines +68 to +71
/// The request headers referenced in the connectors request mapping
pub request_headers: HashSet<String>,
/// The request or response headers referenced in the connectors response mapping
pub response_headers: HashSet<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places where we handle HTTP headers, we've gone to some trouble to make sure you can provide multiple values for the same header, as allowed by HTTP.

So, for example, you pass a list of headers to the @source directive, rather than a simple map:

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.10")
  @link(
    url: "https://specs.apollo.dev/connect/v0.1"
    import: ["@source", "@connect"]
  )
  @source(
    name: "v1"
    http: {
      baseURL: "https://api.example.com/v1"
      headers: [
        { name: "x-api-key", from: "x-api-key" }
        { name: "x-caller", value: "apollo-router" }
      ]
    }
  )

Unless I'm missing something about the assumptions we're making about request/response header uniqueness, we might want to apply the same logic here.

@@ -239,6 +240,32 @@ impl JSONSelection {
.flat_map(|var_path| var_path.variable_reference())
.map(|var_ref| var_ref.namespace.namespace)
}

/// Get any headers referenced in the variable references by looking at both Request and Response namespaces.
pub fn header_references(&self) -> impl Iterator<Item = String> + '_ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this code to ../models.rs (where the other header_references logic lives). We can make anything pub(crate) that needs to be, to accommodate that move.

As long as Namespace::Request and Namespace::Response are defined, $request and $response should behave just like any other known variable.

Comment on lines +103 to +119
let headers: Map<ByteString, Value> = supergraph_request
.headers()
.iter()
.filter_map(|(key, value)| {
if headers_used.contains(key.as_str()) {
return Some((
key.as_str().into(),
value.to_str().unwrap_or_default().into(),
));
}

None
})
.collect();
let request_object = json!({
"headers": Value::Object(headers)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so here's an example of where we turn actual request headers into the $request object. It's noteworthy that the supergraph_request.headers() method returns a HeaderMap (a multimap), and HeaderMap::iter handles multi-value headers by yielding the same header multiple times with different values, which is information that's lost by this filter_map code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to represent multiple keys by concatenating them with commas (and documenting that policy), but that needs to happen somewhere, instead of letting the last value win.

Comma separated makes sense to me. I can see a world where we provide a "split" function and then select the "n-th" element if there are multiple values and they need to select one.

My gut feeling is that while it is certainly possible to have multiple values for the same HTTP header, most of the time headers used in connectors would likely only have one value... thinking this is an 80/20 rule.

Assuming 80% of the time it'll have one value, I'd rather go with the easiest ergonomics for that case and provide a solution for the 20% that there will be multiple values.

This is definitely assumptions and subjective opinion though so if we know of a very common case where we'll have multiple values, happy to be wrong here :D

Copy link
Member

@benjamn benjamn Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our choices here need to be consistent with our choices elsewhere, and it seems like we mostly respect multimap behavior for HTTP headers elsewhere, so it's a departure to disregard that here.

Even if the single-value case is much more common, it's still disappointing as a developer to realize a tool you're using isn't really aligned with the rules/possibilities of HTTP, for those few cases where it matters. You can speculate about the distribution of use cases, but the weird edge cases (needing to interact with an API that abuses multi-value HTTP headers, for example) happen for everyone eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm unclear on what the suggestion is here if not the comma separated list route. That sounded like a very reasonable compromise but I'm unclear if we're okay with that tradeoff or if you're suggesting a different approach. 😅

if not $request.headers.x, what does it look like instead? And how does the user access value a versus value b of the same header?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants