Releases: apollographql/router
v1.13.1
🚀 Features
Router homepage now supports redirecting to Apollo Studio Explorer (PR #2282)
In order to replicate the landing-page experience (called "homepage" on the Router) which was available in Apollo Gateway, we've introduced a graph_ref
option to the homepage
configuration. This allows users to be (optionally, as as sticky preference) redirected from the Apollo Router homepage directly to the correct graph in Apollo Studio Explorer.
Since users may have their own preference on the value, we do not automatically infer the graph reference (e.g., graph@variant
), instead requiring that the user set it to the value of their choice.
For example:
homepage:
graph_ref: my-org-graph@production
By @flyboarder in #2282
New metric for subgraph-requests, including "retry" and "break" events (Issue #2518), (Issue #2736)
We now emit a apollo_router_http_request_retry_total
metric from the Router. The metric also offers observability into aborted requests via an status = "aborted"
attribute on the metric.
New receive_body
span represents time consuming a client's request body (Issue #2518), (Issue #2736)
When running with debug-level instrumentation, the Router now emits a receive_body
span which tracks time spent receiving the request body from the client.
🐛 Fixes
Use single Deno runtime for query planning (Issue #2690)
We now keep the same JavaScript-based query-planning runtime alive for the entirety of the Router's lifetime, rather than disposing of it and creating a new one at several points in time, including when processing GraphQL requests, generating an "API schema" (the publicly queryable version of the supergraph, with private fields excluded), and when processing introspection queries.
Not only is this a more preferred architecture that is more considerate of system resources, but it was also responsible for a memory leak which occurred during supergraph changes.
We believe this will alleviate, but not entirely solve, the circumstances seen in the above-linked issue.
v1.13.0
🚀 Features
Uplink metrics and improved logging (Issue #2769, Issue #2815, Issue #2816)
For monitoring, observability and debugging requirements around Uplink-related behaviors (those which occur as part of Managed Federation) the router now emits better log messages and emits new metrics around these facilities. The new metrics are:
-
apollo_router_uplink_duration_seconds_bucket
: A histogram of durations with the following attributes:url
: The URL that was polledquery
:SupergraphSdl
orEntitlement
type
:new
,unchanged
,http_error
,uplink_error
, orignored
code
: The error code, depending ontype
error
: The error message
-
apollo_router_uplink_fetch_count_total
: A gauge that counts the overall success (status="success"
) or failure (status="failure"
) counts that occur when communicating to Uplink without taking into account fallback.
⚠️ The very first poll to Uplink is unable to capture metrics since its so early in the router's lifecycle that telemetry hasn't yet been setup. We consider this a suitable trade-off and don't want to allow perfect to be the enemy of good.
Here's an example of what these new metrics look like from the Prometheus scraping endpoint:
# HELP apollo_router_uplink_fetch_count_total apollo_router_uplink_fetch_count_total
# TYPE apollo_router_uplink_fetch_count_total gauge
apollo_router_uplink_fetch_count_total{query="SupergraphSdl",service_name="apollo-router",status="success"} 1
# HELP apollo_router_uplink_fetch_duration_seconds apollo_router_uplink_fetch_duration_seconds
# TYPE apollo_router_uplink_fetch_duration_seconds histogram
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.001"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.005"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.015"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.05"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.1"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.2"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.3"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.4"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.5"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="1"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="5"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="10"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="+Inf"} 1
apollo_router_uplink_fetch_duration_seconds_sum{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/"} 0.465257131
apollo_router_uplink_fetch_duration_seconds_count{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/"} 1
By @BrynCooke in #2779, #2817, #2819 #2826
🐛 Fixes
Only process Uplink messages that are deemed to be newer (Issue #2794)
Uplink is backed by multiple cloud providers to ensure high availability. However, this means that there will be periods of time where Uplink endpoints do not agree on what the latest data is. They are eventually consistent.
This has not been a problem for most users, as the default mode of operation for the router is to fallback to the secondary Uplink endpoint if the first fails.
The other mode of operation, is round-robin, which is triggered only when setting the APOLLO_UPLINK_ENDPOINTS
environment variable. In this mode there is a much higher chance that the router will go back and forth between schema versions due to disagreement between the Apollo Uplink servers or any user-provided proxies set into this variable.
This change introduces two fixes:
- The Router will only use fallback strategy. Uplink endpoints are not strongly consistent, and therefore it is better to always poll a primary source of information if available.
- Uplink already handled freshness of schema but now also handles entitlement freshness.
Note: We advise against using
APOLLO_UPLINK_ENDPOINTS
to try to cache uplink responses for high availability purposes. Each request to Uplink currently sends state which limits the usefulness of such a cache.
By @BrynCooke in #2803, #2826, #2846
Distributed caching: Don't send Redis' CLIENT SETNAME
(PR #2825)
We won't send the CLIENT SETNAME
command to connected Redis servers. This resolves an incompatibility with some Redis-compatible servers since not all "Redis-compatible" offerings (like Google Memorystore) actually support every Redis command. We weren't actually necessitating this feature, it was just a feature that could be enabled optionally on our Redis client. No Router functionality is impacted.
Support bare top-level __typename
when aliased (Issue #2792)
PR #1762 implemented support for the query { __typename }
but it did not work properly if the top-level standalone __typename
field was aliased. This now works properly.
Maintain errors set on _entities
(Issue #2731)
In their responses, some subgraph implementations do not return errors per entity but instead on the entire path. We now transmit those, irregardless.
📃 Configuration
Custom OpenTelemetry Datadog exporter mapping (Issue #2228)
This PR fixes the issue with the Datadog exporter not providing meaningful contextual data in the Datadog traces.
There is a known issue where OpenTelemetry is not fully compatible with Datadog.
To fix this, the opentelemetry-datadog
crate added custom mapping functions.
Now, when enable_span_mapping
is set to true
, the Apollo Router will perform the following mapping:
- Use the OpenTelemetry span name to set the Datadog span operation name.
- Use the OpenTelemetry span attributes to set the Datadog span resource name.
For example:
Let's say we send a query MyQuery
to the Apollo Router, then the Router using the operation's query plan will send a query to my-subgraph-name
, producing the following trace:
| apollo_router request |
| apollo_router router |
| apollo_router supergraph |
| apollo_router query_planning | apollo_router execution |
| apollo_router fetch |
| apollo_router subgraph |
| apollo_router subgraph_request |
As you can see, there is no clear information about the name of the query, the name of the subgraph, or the name of query sent to the subgraph.
Instead, with this new enable_span_mapping
setting set to true
, the following trace will be created:
| request /graphql |
| router |
| supergraph MyQuery |
| query_planning MyQuery | execution ...
v1.12.1
🎈 This is a fast-follow to v1.12.0 which included many new updates and new GraphOS Enterprise features. Be sure to check that (longer, more detailed!) changelog for the full details. Thanks!
🐛 Fixes
Retain existing Apollo Uplink entitlements (PR #2781)
Our end-to-end integration testing revealed a newly-introduced bug in v1.12.0 which could affect requests to Apollo Uplink endpoints which are located in different data centers, when those results yield differing responses. This only impacted a very small number of cases, but retaining previous fetched values is undeniably more durable and will fix this so we're expediting a fix.
By @BrynCooke in #2781
v1.12.0
🎈 In this release, we are excited to make three new features generally available to GraphOS Enterprise customers running self-hosted routers: JWT Authentication, Distributed APQ Caching, and External Coprocessor support. Read more about these features below, and see our documentation for additional information.
🚀 Features
GraphOS Enterprise: JWT Authentication
🎈 JWT Authentication is now generally available to GraphOS Enterprise customers running self-hosted routers. To fully account for the changes between the initial experimental release and the final generally available implementation, we recommend removing the experimental configuration and re-implementing it following the documentation below to ensure proper configuration and that all security requirements are met.
Router v1.12 adds support for JWT validation, claim extraction, and custom security policies in Rhai scripting to reject bad traffic at the edge of the graph — for enhanced zero-trust and defense-in-depth. Extracting claims one time in the router and securely forwarding them to subgraphs can reduce the operational burden on backend API teams, reduce JWT processing, and speed up response times with improved header matching for increased query deduplication.
See the JWT Authentication documentation for information on setting up this GraphOS Enterprise feature.
GraphOS Enterprise: Distributed APQ Caching
🎈 Distributed APQ Caching is now generally available to GraphOS Enterprise customers running self-hosted routers. To fully account for the changes between the initial experimental releases and the final generally available implementation, we recommend removing the experimental configuration and re-implementing it following the documentation below to ensure proper configuration.
With Router v1.12, you can now use distributed APQ caching to improve p99 latencies during peak times. A shared Redis instance can now be used by the entire router fleet to build the APQ cache faster and share existing APQ cache with new router instances that are spun up during scaling events – when they need it most. This ensures the fast path to query execution is consistently available to all users even during peak load.
See the distributed APQ caching documentation for information on setting up this GraphOS Enterprise feature.
GraphOS Enterprise: External Coprocessor support
🎈 External Coprocessor support is now generally available to GraphOS Enterprise customers running self-hosted routers. To fully account for the changes between the initial experimental releases and the final generally available implementation, we recommend removing the experimental configuration and re-implementing it following the documentation below to ensure proper configuration.
Router now supports external coprocessors written in your programming language of choice. Coprocessors run with full isolation and a clean separation of concerns, that decouples delivery and provides fault isolation. Low overhead can be achieved by running coprocessors alongside the router on the same host or in the same Kubernetes Pod as a sidecar. Coprocessors can be used to speed Gateway migrations, support bespoke use cases, or integrate the router with existing network services for custom auth (JWT mapping, claim enrichment), service discovery integration, and more!
See the external coprocessor documentation for information on setting up this GraphOS Enterprise feature.
TLS termination (Issue #2615)
If there is no intermediary proxy or load-balancer present capable of doing it, the router ends up responsible for terminating TLS. This can be relevant in the case of needing to support HTTP/2, which requires TLS in most implementations. We've introduced TLS termination support for the router using the rustls
implementation, limited to one server certificate and using safe default ciphers. We do not support TLS versions prior to v1.2.
If you require more advanced TLS termination than this implementation offers, we recommend using a proxy which supports this (as is the case with most cloud-based proxies today).
Make initialDelaySeconds
configurable for health check probes in Helm chart
Currently initialDelaySeconds
uses the default of 0
. This means that Kubernetes will give router no additional time before it does the first probe.
This can be configured as follows:
probes:
readiness:
initialDelaySeconds: 1
liveness:
initialDelaySeconds: 5
GraphQL errors can be thrown within Rhai (PR #2677)
Up until now rhai script throws would yield an http status code and a message String which would end up as a GraphQL error.
This change allows users to throw with a valid GraphQL response body, which may include data, as well as errors and extensions.
Refer to the Terminating client requests
section of the Rhai api documentation to learn how to throw GraphQL payloads.
By @o0Ignition0o in #2677
🐛 Fixes
In-flight requests will terminate before shutdown is completed (Issue #2539)
In-flight client requests will now be completed when the router is asked to shutdown gracefully.
State machine will retain most recent valid config (Issue #2752)
The state machine will retain current state until new state has gone into service. Previously, if the router failed to reload either the configuration or the supergraph, it would discard the incoming state change even if that state change turned out to be invalid. It is important to avoid reloading inconsistent state because the a new supergraph may, for example, directly rely on changes in config to work correctly.
Changing this behaviour means that the router must enter a "good" configuration state before it will reload, rather than reloading with potentially inconsistent state.
For example, previously:
- Router starts with valid supergraph and config.
- Router config is set to something invalid and restart doesn't happen.
- Router receives a new schema, the router restarts with the new supergraph and the original valid config.
Now, the latest information is used to restart the router:
- Router starts with valid schema and config.
- Router config is set to something invalid and restart doesn't happen.
- Router receives a new schema, but the router fails to restart because of config is still invalid.
By @BrynCooke in #2753
Ability to disable HTTP/2 for subgraphs (Issue #2063)
There are cases where the balancing HTTP/2 connections to subgraphs behaves erratically. While we consider this a bug, users may disable HTTP/2 support to subgraphs in the short-term while we work to find the root cause.
Tracing default service name restored (Issue #2641)
With this fix the default tracing service name is restored to router
.
By @BrynCooke in #2642
Header plugin now has a static plugin priority (Issue #2559)
Execution order of the headers
plugin which handles header forwarding is now enforced. This ensures reliable behavior with other built-in plugins.
It is now possible to use custom attributes derived from headers within the telemetry
plugin in addition to using the headers
plugin to propagate/insert headers for subgraphs.
Add content-type
header when publishing Datadog metrics (Issue #2697)
Add the required content-type
header for publishing Datadog metrics from Prometheus:
content-type: text/plain; version=0.0.4
By @ShaunPhillips in #2698
Sandbox Explorer endpoint URL is no longer editable (PR #2729)
The "Endpoint" in the Sandbox Explorer (Which is served by default when running in development mode) is no longer editable, to prevent inadvertent changes. Sandbox is not generally useful with other endpoints as CORS must be configured on the other host.
A hosted version of Sandbox Explorer without this restriction is still available if you necessitate a version which allows editing.
By @mayakoneval in #2729
Argument parsing is now optional in the Executable
builder (PR #2666)
The Executable
builder was parsing command-line arguments, which was causing issues when used as part of a larger application with its own set of command-line flags, leading to those arguments not be recognized by the router. This change allows parsing the arguments separately, then passing the required ones to the Executable
builder directly. The default behav...
v1.11.0
🚀 Features
Support for UUID and Unix timestamp functions in Rhai (PR #2617)
When building Rhai scripts, you'll often need to add headers that either uniquely identify a request, or append timestamp information for processing information later, such as crafting a trace header or otherwise.
While the default timestamp()
and similar functions (e.g. apollo_start
) can be used, they aren't able to be translated into an epoch.
This adds a uuid_v4()
and unix_now()
function to obtain a UUID and Unix timestamp, respectively.
Show option to "Include Cookies" in Sandbox
Adds default support when using the "Include Cookies" toggle in the Embedded Sandbox.
Add a metric to track the cache size (Issue #2522)
We've introduced a new apollo_router_cache_size
metric that reports the current size of in-memory caches. Like other metrics, it is available via OpenTelemetry Metrics including Prometheus scraping.
Add a rhai global variable resolver and populate it (Issue #2628)
Rhai scripts cannot access Rust global constants by default, making cross plugin communication via Context
difficult.
This change introduces a new global variable resolver populates with a Router
global constant. It currently has three members:
APOLLO_START
-> should be used in place ofapollo_start
APOLLO_SDL
-> should be used in place ofapollo_sdl
APOLLO_AUTHENTICATION_JWT_CLAIMS
You access a member of this variable as follows:
let my_var = Router.APOLLO_SDL;
We are removing the experimental APOLLO_AUTHENTICATION_JWT_CLAIMS
constant, but we will retain the existing non-experimental constants for purposes of backwards compatibility.
We recommend that you shift to the new global constants since we will remove the old ones in a major breaking change release in the future.
Activate TLS for Redis cluster connections (Issue #2332)
This adds support for TLS connections in Redis Cluster mode, by applying it when the URLs use the rediss
schema.
By @Geaal in #2605
Make terminationGracePeriodSeconds
property configurable in the Helm chart
The terminationGracePeriodSeconds
property is now configurable on the Deployment
object in the Helm chart.
This can be useful when adjusting the default timeout values for the Router, and should always be a value slightly bigger than the Router timeout in order to ensure no requests are closed prematurely on shutdown.
The Router timeout is configured via traffic_shaping
traffic_shaping:
router:
timeout: ...
🐛 Fixes
Properly emit histograms metrics via OpenTelemetry (Issue #2393)
With the "inexpensive" metrics selector, histograms are only reported as gauges which caused them to be incorrectly interpreted when reaching Datadog
Revisit Open Telemetry integration (Issue #1812, Issue #2359, Issue #2338, Issue #2113, Issue #2113)
There were several issues with the existing OpenTelemetry integration in the Router which we are happy to have resolved with this re-factoring:
-
Metrics would stop working after a schema or config update.
-
Telemetry config could not be changed at runtime, instead requiring a full restart of the router.
-
Logging format would vary depending on where the log statement existed in the code.
-
On shutdown, the following message occurred frequently:
OpenTelemetry trace error occurred: cannot send span to the batch span processor because the channel is closed
-
And worst of all, it had a tendency to leak memory.
We have corrected these by re-visiting the way we integrate with OpenTelemetry and the supporting tracing packages. The new implementation brings our usage in line with new best-practices.
In addition, the testing coverage for telemetry in general has been significantly improved. For more details of what changed and why take a look at #2358.
By @BrynCooke and @Geal and @bnjjj in #2358
Metrics attributes allow value types as defined by OpenTelemetry (Issue #2510)
Metrics attributes in OpenTelemetry allow the following types:
string
string[]
float
float[]
int
int[]
bool
bool[]
However, our configuration only allowed strings. This has been fixed, and therefore it is now possible to use booleans via environment variable expansion as metrics attributes.
For example:
telemetry:
metrics:
prometheus:
enabled: true
common:
attributes:
supergraph:
static:
- name: "my_boolean"
value: ''
By @BrynCooke in #2616
Add missing status
attribute on some metrics (PR #2593)
When labeling metrics, the Router did not consistently add the status
attribute, resulting in an empty status
. You'll now have status="500"
for Router errors.
🛠 Maintenance
Upgrade to Apollo Federation v2.3.2
This brings in a patch update to our Federation support, bringing it to v2.3.2.
CORS: Give a more meaningful message for users who misconfigured allow_any_origin
(PR #2634)
Allowing "any" origin in the router configuration can be done as follows:
cors:
allow_any_origin: true
However, some intuition and familiarity with the CORS specification might also lead someone to configure it as follows:
cors:
origins:
- "*"
Unfortunately, this won't work and the error message received when it was attempted was neither comprehensive nor actionable:
ERROR panicked at 'Wildcard origin (`*`) cannot be passed to `AllowOrigin::list`. Use `AllowOrigin::any()` instead'
This usability improvement adds helpful instructions to the error message, pointing you to the correct pattern for setting up this behavior in the router:
Invalid CORS configuration: use `allow_any_origin: true` to set `Access-Control-Allow-Origin: *`
By @o0Ignition0o in #2634
🧪 Experimental
Cleanup the error reporting in the experimental JWT authentication plugin (PR #2609)
Introduce a new AuthenticationError
enum to document and consolidate various JWT processing errors that may occur.
v1.10.3
🐛 Fixes
Per-type metrics based on FTV1 from subgraphs (Issue #2551)
Since version 1.7.0, Apollo Router generates metrics directly instead of deriving them from traces being sent to Apollo Studio. However, these metrics were incomplete. This adds, based on data reported by subgraphs, the following:
- Statistics about each field of each type of the GraphQL type system
- Statistics about errors at each path location of GraphQL responses
By @SimonSapin in #2541
🛠 Maintenance
Run rustfmt
on xtask/
, too (Issue #2557)
Our xtask
runs cargo fmt --all
which reformats of Rust code in all crates of the workspace. However, the code of xtask itself is a separate workspace. In order for it to be formatted with the same configuration, running a second cargo
command is required. This adds that second command, and applies the corresponding formatting.
Fixes #2557
By @SimonSapin in #2561
🧪 Experimental
Add support to JWT Authentication for JWK without specified alg
Prior to this change, the router would only make use of a JWK for JWT verification if the key had an alg
property.
Now, the router searches through the set of configured JWKS (JSON Web Key Sets) to find the best matching JWK according to the following criteria:
- a matching
kid
andalg
; or - a matching
kid
and algorithm family (kty
, per the RFC 7517; or - a matching algorithm family (
kty
)
The algorithm family is used when the JWKS contain a JWK for which no alg
is specified.
v1.10.2
🐛 Fixes
Resolve incorrect nullification when using @interfaceObject
with particular response objects
Note: This follows up on the v1.10.1 release which also attempted to fix this, but inadvertently excluded a required part of the fix due to an administrative oversight.
The Federation 2.3.x @interfaceObject
feature implies that an interface type in the supergraph may be locally handled as an object type by some specific subgraphs. Therefore, such subgraphs may return objects whose __typename
is the interface type in their response. In some cases, those __typename
were leading the Router to unexpectedly and incorrectly nullify the underlying objects. This was not caught in the initial integration of Federation 2.3.
🛠 Maintenance
Refactor Uplink implementation (Issue #2547
The Apollo Uplink implementation within Apollo Router, which is used for fetching data from Apollo GraphOS, has been decomposed into a reusable component so that it can be used more generically for fetching artifacts. This generally improved code quality and resulted in several new tests being added.
Additionally, our round-robin fetching behaviour is now more durable. Previously, on failure, there would be a delay before trying the next round-robin URL. Now, all URLs will be tried in sequence until exhausted. If ultimately all URLs fail, then the usual delay is applied before trying again.
By @BrynCooke in #2537
Improve Changelog management through conventions and tooling (PR #2545, PR #2534)
New tooling and conventions adjust our "incoming changelog in the next release" mechanism to no longer rely on a single file, but instead leverage a "file per feature" pattern in conjunction with tooling to create that file.
This stubbing takes place through the use of a new command:
cargo xtask changeset create
For more information on the process, read the README in the ./.changesets
directory or consult the referenced Pull Requests below.
v1.10.1
🐛 Fixes
Federation v2.3.1 (Issue #2556)
Update to Federation v2.3.1 to fix subtle bug in @interfaceObject
.
🛠 Maintenance
Redis integration tests (Issue #2174)
We now have integration tests for Redis usage with Automatic Persisted Queries and query planning.
CI: Enable compliance checks except licenses.html
update (Issue #2514)
In #1573, we removed the compliance checks for non-release CI pipelines, because cargo-about
output would change ever so slightly on each run.
While many of the checks provided by the compliance check are license related, some checks prevent us from inadvertently downgrading libraries and needing to open, e.g., Issue #2512.
This set of changes includes the following:
- Introduce
cargo xtask licenses
to update licenses.html. - Separate compliance (
cargo-deny
, which includes license checks) and licenses generation (cargo-about
) inxtask
- Enable compliance as part of our CI checks for each open PR
- Update
cargo xtask all
so it runs tests, checks compliance and updateslicenses.html
- Introduce
cargo xtask dev
so it checks compliance and runs tests
Going forward, when developing on the Router source:
- Use
cargo xtask all
to make sure everything is up to date before a release. - Use
cargo xtask dev
before a PR.
As a last note, updating licenses.html
is now driven by cargo xtask licenses
, which is part of the release checklist and automated through our release tooling in xtask
.
By @o0Ignition0o in #2520
Fix flaky tracing integration test (Issue #2548)
Disable federated-tracing (FTV1) in tests by lowering the sampling rate to zero so that consistent results are generated in test snapshots.
By @bryncooke in #2549
Update to Rust 1.67
We've updated the Minimum Supported Rust Version (MSRV) version to v1.67.
By @SimonSapin in #2496 and #2499
v1.10.0
🚀 Features
Update to Federation v2.3.0 (Issue #2465, Issue #2485 and Issue #2489)
This brings in Federation v2.3.0 execution support for:
@interfaceObject
(added to federation in federation#2277).- the bug fix from federation#2294.
By @abernix and @o0Ignition0o in #2462
By @pcmanus in #2485 and #2489
Always deduplicate variables on subgraph entity fetches (Issue #2387)
Variable deduplication allows the router to reduce the number of entities that are requested from subgraphs if some of them are redundant, and as such reduce the size of subgraph responses. It has been available for a while but was not active by default. This is now always on.
Add optional Access-Control-Max-Age
header to CORS plugin (Issue #2212)
Adds new option called max_age
to the existing cors
object which will set the value returned in the Access-Control-Max-Age
header. As was the case previously, when this value is not set no value is returned.
It can be enabled using our standard time notation, as follows:
cors:
max_age: 1day
By @osamra-rbi in #2331
Improved support for wildcards in supergraph.path
configuration (Issue #2406)
You can now use a wildcard in supergraph endpoint path
like this:
supergraph:
listen: 0.0.0.0:4000
path: /graph*
In this example, the Router would respond to requests on both /graphql
and /graphiql
.
🐛 Fixes
Forbid caching PERSISTED_QUERY_NOT_FOUND
responses (Issue #2502)
The router now sends a cache-control: private, no-cache, must-revalidate
response header to clients, in addition to the existing PERSISTED_QUERY_NOT_FOUND
error code on the response which was being sent previously. This expanded behaviour occurs when a persisted query hash could not be found and is important since such responses should not be cached by intermediary proxies/CDNs since the client will need to be able to send the full query directly to the Router on a subsequent request.
By @o0Ignition0o in #2503
Listen on root URL when /*
is set in supergraph.path
configuration (Issue #2471)
This resolves a regression which occurred in Router 1.8 when using wildcard notation on a path-boundary, as such:
supergraph:
path: /*
This occurred due to an underlying Axum upgrade and resulted in failure to listen on localhost
when a path was absent. We now special case /*
to also listen to the URL without a path so you're able to call http://localhost
(for example).
Subgraph traffic shaping timeouts now return HTTP 504 status code (Issue #2360 Issue #2400)
There was a regression where timeouts resulted in a HTTP response of 500 Internal Server Error
. This is now fixed with a test to guarantee it, the status code is now 504 Gateway Timeout
(instead of the previous 408 Request Timeout
which, was also incorrect in that it blamed the client).
There is also a new metric emitted called apollo_router_timeout
to track when timeouts are triggered.
Fix panic in schema parse error reporting (Issue #2269)
In order to support introspection, some definitions like type __Field { … }
are implicitly added to schemas. This addition was done by string concatenation at the source level. In some cases, like unclosed braces, a parse error could be reported at a position beyond the size of the original source. This would cause a panic because only the unconcatenated string is sent to the error reporting library miette
.
Instead, the Router now parses introspection types separately and "concatenates" the definitions at the AST level.
By @SimonSapin in #2448
Always accept compressed subgraph responses (Issue #2415)
Previously, subgraph response decompression was only supported when subgraph request compression was explicitly configured. This is now always active.
Fix handling of root query operations not named Query
If you'd mapped your default Query
type to something other than the default using schema { query: OtherQuery }
, some parsing code in the Router would incorrectly return an error because it had previously assumed the default name of Query
. The same case would have occurred if the root mutation type was not named Mutation
.
This is now corrected and the Router understands the mapping.
By @SimonSapin in #2459
Remove the locations
field from subgraph errors (Issue #2297)
Subgraph errors can come with a locations
field indicating which part of the query was causing issues, but it refers to the subgraph query generated by the query planner, and we have no way of translating it to locations in the client query. To avoid confusion, we've removed this field from the response until we can provide a more coherent way to map these errors back to the original operation.
Emit metrics showing number of client connections (issue #2384)
New metrics are available to track the client connections:
apollo_router_session_count_total
indicates the number of currently connected clientsapollo_router_session_count_active
indicates the number of in flight GraphQL requests from connected clients.
This also fixes the behaviour when we reach the maximum number of file descriptors: instead of going into a busy loop, the router will wait a bit before accepting a new connection.
--dev
will no longer modify configuration that it does not directly touch (Issue #2404, Issue #2481)
Previously, the Router's --dev
mode was operating against the configuration object model. This meant that it would sometimes replace pieces of configuration where it should have merely modified it. Now, --dev
mode will override the following properties in the YAML config, but it will leave any adjacent configuration as it was:
homepage:
enabled: false
include_subgraph_errors:
all: true
plugins:
experimental.expose_query_plan: true
sandbox:
enabled: true
supergraph:
introspection: true
telemetry:
tracing:
experimental_response_trace_id:
enabled: true
By @BrynCooke in #2489
🛠 Maintenance
Improve #[serde(default)]
attribute on structs (Issue #2424)
If all the fields of your struct
have their default value then use the #[serde(default)]
on the struct
instead of on each field. If you have specific default values for a field, you'll have to create your own impl Default
for the struct
.
Correct approach
#[serde(deny_unknown_fields, default)]
struct Export {
url: Url,
enabled: bool
}
impl Default for Export {
fn default() -> Self {
Self {
url: default_url_fn(),
enabled: false
}
}
}
Discouraged approach
#[serde(deny_unknown_fields)]
struct Export {
#[serde(default="default_url_fn")
url: Url,
#[serde(default)]
enabled: bool
}
📃 Configuration
Configuration changes will be automatically migrated on load. However, you should update your source configuration files as these will become breaking changes in a future major release.
health-check
has been renamed to health_check
(Issue #2161)
The health_check
option in the configuration has been renamed to use snake_case
rather than kebab-case
for consistency with the other properties in the configuration:
-health-check:
+health_check:
enabled: true
By @BrynCooke in #2451 and #2463
📚 Documentation
Disabling anonymous usage metrics (Issue #2478)
To disable the anonymous usage metrics, you set APOLLO_TELEMETRY_DISABLED=true
in the environment. The documentation previously said to use 1
as the value instead of true
. In the future, either will work, so this is primarily a...
v1.9.0
🚀 Features
Add support for base64::encode()
/ base64::decode()
in Rhai (Issue #2025)
Two new functions, base64::encode()
and base64::decode()
, have been added to the capabilities available within Rhai scripts to Base64-encode or Base64-decode strings, respectively.
Override the root TLS certificate list for subgraph requests (Issue #1503)
In some cases, users need to use self-signed certificates or use a custom certificate authority (CA) when communicating with subgraphs.
It is now possible to consigure these certificate-related details using configuration for either specific subgraphs or all subgraphs, as follows:
tls:
subgraph:
all:
certificate_authorities: "${file./path/to/ca.crt}"
# Use a separate certificate for the `products` subgraph.
subgraphs:
products:
certificate_authorities: "${file./path/to/product_ca.crt}"
The file referenced in the certificate_authorities
value is expected to be the combination of several PEM certificates, concatenated together into a single file (as is commonplace with Apache TLS configuration).
These certificates are only configurable via the Router's configuration since using SSL_CERT_FILE
would also override certificates for sending telemetry and communicating with Apollo Uplink.
While we do not currently support terminating TLS at the Router (from clients), the tls
is located at the root of the configuration file to allow all TLS-related configuration to be semantically grouped together in the future.
Note: If you are attempting to use a self-signed certificate, it must be generated with the proper file extension and with basicConstraints
disabled. For example, a v3.ext
extension file:
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer:always
# this has to be disabled
# basicConstraints = CA:TRUE
keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyAgreement, keyCertSign
subjectAltName = DNS:local.apollo.dev
issuerAltName = issuer:copy
Using this v3.ext
file, the certificate can be generated with the appropriate certificate signing request (CSR) - in this example, server.csr
- using the following openssl
command:
openssl x509 -req -in server.csr -signkey server.key -out server.crt -extfile v3.ext
This will produce the file as server.crt
which can be passed as certificate_authorities
.
Measure the Router's processing time (Issue #1949 Issue #2057)
The Router now emits a metric called apollo_router_processing_time
which measures the time spent executing the request minus the time spent waiting for an external requests (e.g., subgraph request/response or external plugin request/response). This measurement accounts both for the time spent actually executing the request as well as the time spent waiting for concurrent client requests to be executed. The unit of measurement for the metric is in seconds, as with other time-related metrics the router produces, though this is not meant to indicate in any way that the Router is going to add actual seconds of overhead.
Automated persisted queries support for subgraph requests (PR #2284)
Automatic persisted queries (APQ) (See useful context in our Apollo Server docs) can now be used for subgraph requests. It is disabled by default, and can be configured for all subgraphs or per subgraph:
supergraph:
apq:
subgraph:
# override for all subgraphs
all:
enabled: false
# override per subgraph
subgraphs:
products:
enabled: true
By @krishna15898 and @Geal in #2284 and #2418
Allow the disabling of automated persisted queries (PR #2386)
Automatic persisted queries (APQ) support is still enabled by default on the client side, but can now be disabled in the configuration:
supergraph:
apq:
enabled: false
Anonymous product usage analytics (Issue #2124, Issue #2397, Issue #2412)
Following up on #1630, the Router transmits anonymous usage telemetry about configurable feature usage which helps guide Router product development. No information is transmitted in our usage collection that includes any request-specific information. Knowing what features and configuration our users are depending on allows us to evaluate opportunities to reduce complexity and remain diligent about the surface area of the Router over time. The privacy of your and your user's data is of critical importance to the core Router team and we handle it with great care in accordance with our privacy policy, which clearly states which data we collect and transmit and offers information on how to opt-out.
Booleans and numeric values are included, however, any strings are represented as <redacted>
to avoid leaking confidential or sensitive information.
For example:
{
"session_id": "fbe09da3-ebdb-4863-8086-feb97464b8d7", // Randomly generated at Router startup.
"version": "1.4.0", // The version of the router
"os": "linux",
"ci": null, // If CI is detected then this will name the CI vendor
"usage": {
"configuration.headers.all.request.propagate.named.<redacted>": 3,
"configuration.headers.all.request.propagate.default.<redacted>": 1,
"configuration.headers.all.request.len": 3,
"configuration.headers.subgraphs.<redacted>.request.propagate.named.<redacted>": 2,
"configuration.headers.subgraphs.<redacted>.request.len": 2,
"configuration.headers.subgraphs.len": 1,
"configuration.homepage.enabled.true": 1,
"args.config-path.redacted": 1,
"args.hot-reload.true": 1,
//Many more keys. This is dynamic and will change over time.
//More...
//More...
//More...
}
}
Users can disable this mechanism by setting the environment variable APOLLO_TELEMETRY_DISABLED=true
in their environment.
By @BrynCooke in #2173, #2398, #2413
🐛 Fixes
Don't send header names to Studio if send_headers
is none
(Issue #2403)
We no longer transmit header names to Apollo Studio when send_headers
is set to none
(the default). Previously, when send_headers
was set to none
(like in the following example) the header names were still transmitted with empty header values. No actual values were ever being sent unless send_headers
was sent to a more permissive option like forward_headers_only
or forward_headers_except
.
telemetry:
apollo:
send_headers: none
Response with Content-type: application/json
when encountering incompatible Content-type
or Accept
request headers (Issue #2334)
When receiving requests with content-type
and accept
header mismatches (e.g., on multipart requests) the Router now utilizes a correct content-type
header in its response.
Fix APOLLO_USAGE_REPORTING_INGRESS_URL
behavior when Router was run without a configuration file
The environment variable APOLLO_USAGE_REPORTING_INGRESS_URL
(not usually necessary under typical operation) was not being applied correctly when the Router was run without a configuration file.
In addition, defaulting of environment variables now directly injects the variable rather than injecting via expansion expression. This means that the use of APOLLO_ROUTER_CONFIG_ENV_PREFIX
(even less common) doesn't affect injected configuration defaults.
By @BrynCooke in #2432
🛠 Maintenance
Remove unused factory traits (PR #2372)
We removed a factory trait that was only used in a single implementation, which removes the overall requirement that execution and subgraph building take place via that factory trait.
Optimize header propagation plugin's regular expression matching (PR #2392)
We've changed the header propagation plugins' behavior to reduce the chance of memory allocations occurring when applying regex-based header propagation rules.
By @o0Ignition0o in #2392
📚 Documentation
Creating custom metrics in plugins (Issue #2294)
To create your custom metrics in Prometheus you can use the tracing
macros to generate an event. If you observe a specific naming pattern for your event, you'll be able to...