Skip to content

Releases: apollographql/router

v1.13.1

28 Mar 14:16
f65ea72
Compare
Choose a tag to compare

🚀 Features

Router homepage now supports redirecting to Apollo Studio Explorer (PR #2282)

In order to replicate the landing-page experience (called "homepage" on the Router) which was available in Apollo Gateway, we've introduced a graph_ref option to the homepage configuration. This allows users to be (optionally, as as sticky preference) redirected from the Apollo Router homepage directly to the correct graph in Apollo Studio Explorer.

Since users may have their own preference on the value, we do not automatically infer the graph reference (e.g., graph@variant), instead requiring that the user set it to the value of their choice.

For example:

homepage:
  graph_ref: my-org-graph@production

By @flyboarder in #2282

New metric for subgraph-requests, including "retry" and "break" events (Issue #2518), (Issue #2736)

We now emit a apollo_router_http_request_retry_total metric from the Router. The metric also offers observability into aborted requests via an status = "aborted" attribute on the metric.

By @Geal in #2829

New receive_body span represents time consuming a client's request body (Issue #2518), (Issue #2736)

When running with debug-level instrumentation, the Router now emits a receive_body span which tracks time spent receiving the request body from the client.

By @Geal in #2829

🐛 Fixes

Use single Deno runtime for query planning (Issue #2690)

We now keep the same JavaScript-based query-planning runtime alive for the entirety of the Router's lifetime, rather than disposing of it and creating a new one at several points in time, including when processing GraphQL requests, generating an "API schema" (the publicly queryable version of the supergraph, with private fields excluded), and when processing introspection queries.

Not only is this a more preferred architecture that is more considerate of system resources, but it was also responsible for a memory leak which occurred during supergraph changes.

We believe this will alleviate, but not entirely solve, the circumstances seen in the above-linked issue.

By @Geal in #2706

v1.13.0

24 Mar 10:34
e341245
Compare
Choose a tag to compare

🚀 Features

Uplink metrics and improved logging (Issue #2769, Issue #2815, Issue #2816)

For monitoring, observability and debugging requirements around Uplink-related behaviors (those which occur as part of Managed Federation) the router now emits better log messages and emits new metrics around these facilities. The new metrics are:

  • apollo_router_uplink_duration_seconds_bucket: A histogram of durations with the following attributes:

    • url: The URL that was polled
    • query: SupergraphSdl or Entitlement
    • type: new, unchanged, http_error, uplink_error, or ignored
    • code: The error code, depending on type
    • error: The error message
  • apollo_router_uplink_fetch_count_total: A gauge that counts the overall success (status="success") or failure (status="failure") counts that occur when communicating to Uplink without taking into account fallback.

⚠️ The very first poll to Uplink is unable to capture metrics since its so early in the router's lifecycle that telemetry hasn't yet been setup. We consider this a suitable trade-off and don't want to allow perfect to be the enemy of good.

Here's an example of what these new metrics look like from the Prometheus scraping endpoint:

# HELP apollo_router_uplink_fetch_count_total apollo_router_uplink_fetch_count_total
# TYPE apollo_router_uplink_fetch_count_total gauge
apollo_router_uplink_fetch_count_total{query="SupergraphSdl",service_name="apollo-router",status="success"} 1
# HELP apollo_router_uplink_fetch_duration_seconds apollo_router_uplink_fetch_duration_seconds
# TYPE apollo_router_uplink_fetch_duration_seconds histogram
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.001"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.005"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.015"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.05"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.1"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.2"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.3"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.4"} 0
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="0.5"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="1"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="5"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="10"} 1
apollo_router_uplink_fetch_duration_seconds_bucket{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/",le="+Inf"} 1
apollo_router_uplink_fetch_duration_seconds_sum{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/"} 0.465257131
apollo_router_uplink_fetch_duration_seconds_count{kind="unchanged",query="SupergraphSdl",service_name="apollo-router",url="https://uplink.api.apollographql.com/"} 1

By @BrynCooke in #2779, #2817, #2819 #2826

🐛 Fixes

Only process Uplink messages that are deemed to be newer (Issue #2794)

Uplink is backed by multiple cloud providers to ensure high availability. However, this means that there will be periods of time where Uplink endpoints do not agree on what the latest data is. They are eventually consistent.

This has not been a problem for most users, as the default mode of operation for the router is to fallback to the secondary Uplink endpoint if the first fails.

The other mode of operation, is round-robin, which is triggered only when setting the APOLLO_UPLINK_ENDPOINTS environment variable. In this mode there is a much higher chance that the router will go back and forth between schema versions due to disagreement between the Apollo Uplink servers or any user-provided proxies set into this variable.

This change introduces two fixes:

  1. The Router will only use fallback strategy. Uplink endpoints are not strongly consistent, and therefore it is better to always poll a primary source of information if available.
  2. Uplink already handled freshness of schema but now also handles entitlement freshness.

Note: We advise against using APOLLO_UPLINK_ENDPOINTS to try to cache uplink responses for high availability purposes. Each request to Uplink currently sends state which limits the usefulness of such a cache.

By @BrynCooke in #2803, #2826, #2846

Distributed caching: Don't send Redis' CLIENT SETNAME (PR #2825)

We won't send the CLIENT SETNAME command to connected Redis servers. This resolves an incompatibility with some Redis-compatible servers since not all "Redis-compatible" offerings (like Google Memorystore) actually support every Redis command. We weren't actually necessitating this feature, it was just a feature that could be enabled optionally on our Redis client. No Router functionality is impacted.

By @Geal in #2825

Support bare top-level __typename when aliased (Issue #2792)

PR #1762 implemented support for the query { __typename } but it did not work properly if the top-level standalone __typename field was aliased. This now works properly.

By @glasser in #2791

Maintain errors set on _entities (Issue #2731)

In their responses, some subgraph implementations do not return errors per entity but instead on the entire path. We now transmit those, irregardless.

By @Geal in #2756

📃 Configuration

Custom OpenTelemetry Datadog exporter mapping (Issue #2228)

This PR fixes the issue with the Datadog exporter not providing meaningful contextual data in the Datadog traces.
There is a known issue where OpenTelemetry is not fully compatible with Datadog.

To fix this, the opentelemetry-datadog crate added custom mapping functions.

Now, when enable_span_mapping is set to true, the Apollo Router will perform the following mapping:

  1. Use the OpenTelemetry span name to set the Datadog span operation name.
  2. Use the OpenTelemetry span attributes to set the Datadog span resource name.

For example:

Let's say we send a query MyQuery to the Apollo Router, then the Router using the operation's query plan will send a query to my-subgraph-name, producing the following trace:

    | apollo_router request                                                                 |
        | apollo_router router                                                              |
            | apollo_router supergraph                                                      |
            | apollo_router query_planning  | apollo_router execution                       |
                                                | apollo_router fetch                       |
                                                    | apollo_router subgraph                |
                                                        | apollo_router subgraph_request    |

As you can see, there is no clear information about the name of the query, the name of the subgraph, or the name of query sent to the subgraph.

Instead, with this new enable_span_mapping setting set to true, the following trace will be created:

    | request /graphql                                                                                   |
        | router                                                                                         |
            | supergraph MyQuery                                                                         |
                | query_planning MyQuery  | execution                                    ...
Read more

v1.12.1

15 Mar 18:18
7be7ce5
Compare
Choose a tag to compare

🎈 This is a fast-follow to v1.12.0 which included many new updates and new GraphOS Enterprise features. Be sure to check that (longer, more detailed!) changelog for the full details. Thanks!

🐛 Fixes

Retain existing Apollo Uplink entitlements (PR #2781)

Our end-to-end integration testing revealed a newly-introduced bug in v1.12.0 which could affect requests to Apollo Uplink endpoints which are located in different data centers, when those results yield differing responses. This only impacted a very small number of cases, but retaining previous fetched values is undeniably more durable and will fix this so we're expediting a fix.

By @BrynCooke in #2781

v1.12.0

15 Mar 13:08
Compare
Choose a tag to compare

🎈 In this release, we are excited to make three new features generally available to GraphOS Enterprise customers running self-hosted routers: JWT Authentication, Distributed APQ Caching, and External Coprocessor support. Read more about these features below, and see our documentation for additional information.

🚀 Features

GraphOS Enterprise: JWT Authentication

🎈 JWT Authentication is now generally available to GraphOS Enterprise customers running self-hosted routers. To fully account for the changes between the initial experimental release and the final generally available implementation, we recommend removing the experimental configuration and re-implementing it following the documentation below to ensure proper configuration and that all security requirements are met.

Router v1.12 adds support for JWT validation, claim extraction, and custom security policies in Rhai scripting to reject bad traffic at the edge of the graph — for enhanced zero-trust and defense-in-depth. Extracting claims one time in the router and securely forwarding them to subgraphs can reduce the operational burden on backend API teams, reduce JWT processing, and speed up response times with improved header matching for increased query deduplication.

See the JWT Authentication documentation for information on setting up this GraphOS Enterprise feature.

GraphOS Enterprise: Distributed APQ Caching

🎈 Distributed APQ Caching is now generally available to GraphOS Enterprise customers running self-hosted routers. To fully account for the changes between the initial experimental releases and the final generally available implementation, we recommend removing the experimental configuration and re-implementing it following the documentation below to ensure proper configuration.

With Router v1.12, you can now use distributed APQ caching to improve p99 latencies during peak times. A shared Redis instance can now be used by the entire router fleet to build the APQ cache faster and share existing APQ cache with new router instances that are spun up during scaling events – when they need it most. This ensures the fast path to query execution is consistently available to all users even during peak load.

See the distributed APQ caching documentation for information on setting up this GraphOS Enterprise feature.

GraphOS Enterprise: External Coprocessor support

🎈 External Coprocessor support is now generally available to GraphOS Enterprise customers running self-hosted routers. To fully account for the changes between the initial experimental releases and the final generally available implementation, we recommend removing the experimental configuration and re-implementing it following the documentation below to ensure proper configuration.

Router now supports external coprocessors written in your programming language of choice. Coprocessors run with full isolation and a clean separation of concerns, that decouples delivery and provides fault isolation. Low overhead can be achieved by running coprocessors alongside the router on the same host or in the same Kubernetes Pod as a sidecar. Coprocessors can be used to speed Gateway migrations, support bespoke use cases, or integrate the router with existing network services for custom auth (JWT mapping, claim enrichment), service discovery integration, and more!

See the external coprocessor documentation for information on setting up this GraphOS Enterprise feature.

TLS termination (Issue #2615)

If there is no intermediary proxy or load-balancer present capable of doing it, the router ends up responsible for terminating TLS. This can be relevant in the case of needing to support HTTP/2, which requires TLS in most implementations. We've introduced TLS termination support for the router using the rustls implementation, limited to one server certificate and using safe default ciphers. We do not support TLS versions prior to v1.2.

If you require more advanced TLS termination than this implementation offers, we recommend using a proxy which supports this (as is the case with most cloud-based proxies today).

By @Geal in #2614

Make initialDelaySeconds configurable for health check probes in Helm chart

Currently initialDelaySeconds uses the default of 0. This means that Kubernetes will give router no additional time before it does the first probe.

This can be configured as follows:

probes:
  readiness:
    initialDelaySeconds: 1
  liveness:
    initialDelaySeconds: 5

By @Meemaw in #2660

GraphQL errors can be thrown within Rhai (PR #2677)

Up until now rhai script throws would yield an http status code and a message String which would end up as a GraphQL error.
This change allows users to throw with a valid GraphQL response body, which may include data, as well as errors and extensions.

Refer to the Terminating client requests section of the Rhai api documentation to learn how to throw GraphQL payloads.

By @o0Ignition0o in #2677

🐛 Fixes

In-flight requests will terminate before shutdown is completed (Issue #2539)

In-flight client requests will now be completed when the router is asked to shutdown gracefully.

By @Geal in #2610

State machine will retain most recent valid config (Issue #2752)

The state machine will retain current state until new state has gone into service. Previously, if the router failed to reload either the configuration or the supergraph, it would discard the incoming state change even if that state change turned out to be invalid. It is important to avoid reloading inconsistent state because the a new supergraph may, for example, directly rely on changes in config to work correctly.

Changing this behaviour means that the router must enter a "good" configuration state before it will reload, rather than reloading with potentially inconsistent state.

For example, previously:

  1. Router starts with valid supergraph and config.
  2. Router config is set to something invalid and restart doesn't happen.
  3. Router receives a new schema, the router restarts with the new supergraph and the original valid config.

Now, the latest information is used to restart the router:

  1. Router starts with valid schema and config.
  2. Router config is set to something invalid and restart doesn't happen.
  3. Router receives a new schema, but the router fails to restart because of config is still invalid.

By @BrynCooke in #2753

Ability to disable HTTP/2 for subgraphs (Issue #2063)

There are cases where the balancing HTTP/2 connections to subgraphs behaves erratically. While we consider this a bug, users may disable HTTP/2 support to subgraphs in the short-term while we work to find the root cause.

By @Geal in #2621

Tracing default service name restored (Issue #2641)

With this fix the default tracing service name is restored to router.

By @BrynCooke in #2642

Header plugin now has a static plugin priority (Issue #2559)

Execution order of the headers plugin which handles header forwarding is now enforced. This ensures reliable behavior with other built-in plugins.

It is now possible to use custom attributes derived from headers within the telemetry plugin in addition to using the headers plugin to propagate/insert headers for subgraphs.

By @bnjjj in #2670

Add content-type header when publishing Datadog metrics (Issue #2697)

Add the required content-type header for publishing Datadog metrics from Prometheus:

content-type: text/plain; version=0.0.4

By @ShaunPhillips in #2698

Sandbox Explorer endpoint URL is no longer editable (PR #2729)

The "Endpoint" in the Sandbox Explorer (Which is served by default when running in development mode) is no longer editable, to prevent inadvertent changes. Sandbox is not generally useful with other endpoints as CORS must be configured on the other host.

A hosted version of Sandbox Explorer without this restriction is still available if you necessitate a version which allows editing.

By @mayakoneval in #2729

Argument parsing is now optional in the Executable builder (PR #2666)

The Executable builder was parsing command-line arguments, which was causing issues when used as part of a larger application with its own set of command-line flags, leading to those arguments not be recognized by the router. This change allows parsing the arguments separately, then passing the required ones to the Executable builder directly. The default behav...

Read more

v1.11.0

21 Feb 14:59
cf8d8e4
Compare
Choose a tag to compare

🚀 Features

Support for UUID and Unix timestamp functions in Rhai (PR #2617)

When building Rhai scripts, you'll often need to add headers that either uniquely identify a request, or append timestamp information for processing information later, such as crafting a trace header or otherwise.

While the default timestamp() and similar functions (e.g. apollo_start) can be used, they aren't able to be translated into an epoch.

This adds a uuid_v4() and unix_now() function to obtain a UUID and Unix timestamp, respectively.

By @lleadbet in #2617

Show option to "Include Cookies" in Sandbox

Adds default support when using the "Include Cookies" toggle in the Embedded Sandbox.

By @esilverm in #2553

Add a metric to track the cache size (Issue #2522)

We've introduced a new apollo_router_cache_size metric that reports the current size of in-memory caches. Like other metrics, it is available via OpenTelemetry Metrics including Prometheus scraping.

By @Geal in #2607

Add a rhai global variable resolver and populate it (Issue #2628)

Rhai scripts cannot access Rust global constants by default, making cross plugin communication via Context difficult.

This change introduces a new global variable resolver populates with a Router global constant. It currently has three members:

  • APOLLO_START -> should be used in place of apollo_start
  • APOLLO_SDL -> should be used in place of apollo_sdl
  • APOLLO_AUTHENTICATION_JWT_CLAIMS

You access a member of this variable as follows:

let my_var = Router.APOLLO_SDL;

We are removing the experimental APOLLO_AUTHENTICATION_JWT_CLAIMS constant, but we will retain the existing non-experimental constants for purposes of backwards compatibility.

We recommend that you shift to the new global constants since we will remove the old ones in a major breaking change release in the future.

By @garypen in #2627

Activate TLS for Redis cluster connections (Issue #2332)

This adds support for TLS connections in Redis Cluster mode, by applying it when the URLs use the rediss schema.

By @Geaal in #2605

Make terminationGracePeriodSeconds property configurable in the Helm chart

The terminationGracePeriodSeconds property is now configurable on the Deployment object in the Helm chart.

This can be useful when adjusting the default timeout values for the Router, and should always be a value slightly bigger than the Router timeout in order to ensure no requests are closed prematurely on shutdown.

The Router timeout is configured via traffic_shaping

traffic_shaping:
  router:
    timeout: ...

By @Meemaw in #2582

🐛 Fixes

Properly emit histograms metrics via OpenTelemetry (Issue #2393)

With the "inexpensive" metrics selector, histograms are only reported as gauges which caused them to be incorrectly interpreted when reaching Datadog

By @Geal in #2564

Revisit Open Telemetry integration (Issue #1812, Issue #2359, Issue #2338, Issue #2113, Issue #2113)

There were several issues with the existing OpenTelemetry integration in the Router which we are happy to have resolved with this re-factoring:

  • Metrics would stop working after a schema or config update.

  • Telemetry config could not be changed at runtime, instead requiring a full restart of the router.

  • Logging format would vary depending on where the log statement existed in the code.

  • On shutdown, the following message occurred frequently:

    OpenTelemetry trace error occurred: cannot send span to the batch span processor because the channel is closed
    
  • And worst of all, it had a tendency to leak memory.

We have corrected these by re-visiting the way we integrate with OpenTelemetry and the supporting tracing packages. The new implementation brings our usage in line with new best-practices.

In addition, the testing coverage for telemetry in general has been significantly improved. For more details of what changed and why take a look at #2358.

By @BrynCooke and @Geal and @bnjjj in #2358

Metrics attributes allow value types as defined by OpenTelemetry (Issue #2510)

Metrics attributes in OpenTelemetry allow the following types:

  • string
  • string[]
  • float
  • float[]
  • int
  • int[]
  • bool
  • bool[]

However, our configuration only allowed strings. This has been fixed, and therefore it is now possible to use booleans via environment variable expansion as metrics attributes.

For example:

telemetry:
  metrics:
    prometheus:
      enabled: true
    common:
      attributes:
        supergraph:
          static:
            - name: "my_boolean"
              value: ''

By @BrynCooke in #2616

Add missing status attribute on some metrics (PR #2593)

When labeling metrics, the Router did not consistently add the status attribute, resulting in an empty status. You'll now have status="500" for Router errors.

By @bnjjj in #2593

🛠 Maintenance

Upgrade to Apollo Federation v2.3.2

This brings in a patch update to our Federation support, bringing it to v2.3.2.

By @abernix in #2586

CORS: Give a more meaningful message for users who misconfigured allow_any_origin (PR #2634)

Allowing "any" origin in the router configuration can be done as follows:

cors:
  allow_any_origin: true

However, some intuition and familiarity with the CORS specification might also lead someone to configure it as follows:

cors:
  origins:
    - "*"

Unfortunately, this won't work and the error message received when it was attempted was neither comprehensive nor actionable:

ERROR panicked at 'Wildcard origin (`*`) cannot be passed to `AllowOrigin::list`. Use `AllowOrigin::any()` instead'

This usability improvement adds helpful instructions to the error message, pointing you to the correct pattern for setting up this behavior in the router:

Invalid CORS configuration: use `allow_any_origin: true` to set `Access-Control-Allow-Origin: *`

By @o0Ignition0o in #2634

🧪 Experimental

Cleanup the error reporting in the experimental JWT authentication plugin (PR #2609)

Introduce a new AuthenticationError enum to document and consolidate various JWT processing errors that may occur.

By @garypen in #2609

v1.10.3

10 Feb 18:29
46f4079
Compare
Choose a tag to compare

🐛 Fixes

Per-type metrics based on FTV1 from subgraphs (Issue #2551)

Since version 1.7.0, Apollo Router generates metrics directly instead of deriving them from traces being sent to Apollo Studio. However, these metrics were incomplete. This adds, based on data reported by subgraphs, the following:

  • Statistics about each field of each type of the GraphQL type system
  • Statistics about errors at each path location of GraphQL responses

By @SimonSapin in #2541

🛠 Maintenance

Run rustfmt on xtask/, too (Issue #2557)

Our xtask runs cargo fmt --all which reformats of Rust code in all crates of the workspace. However, the code of xtask itself is a separate workspace. In order for it to be formatted with the same configuration, running a second cargo command is required. This adds that second command, and applies the corresponding formatting.

Fixes #2557

By @SimonSapin in #2561

🧪 Experimental

Add support to JWT Authentication for JWK without specified alg

Prior to this change, the router would only make use of a JWK for JWT verification if the key had an alg property.

Now, the router searches through the set of configured JWKS (JSON Web Key Sets) to find the best matching JWK according to the following criteria:

  • a matching kid and alg; or
  • a matching kid and algorithm family (kty, per the RFC 7517; or
  • a matching algorithm family (kty)

The algorithm family is used when the JWKS contain a JWK for which no alg is specified.

By @garypen in #2540 and #2540

v1.10.2

08 Feb 14:19
5c21ca5
Compare
Choose a tag to compare

🐛 Fixes

Resolve incorrect nullification when using @interfaceObject with particular response objects

Note: This follows up on the v1.10.1 release which also attempted to fix this, but inadvertently excluded a required part of the fix due to an administrative oversight.

The Federation 2.3.x @interfaceObject feature implies that an interface type in the supergraph may be locally handled as an object type by some specific subgraphs. Therefore, such subgraphs may return objects whose __typename is the interface type in their response. In some cases, those __typename were leading the Router to unexpectedly and incorrectly nullify the underlying objects. This was not caught in the initial integration of Federation 2.3.

By @pcmanus in #2530

🛠 Maintenance

Refactor Uplink implementation (Issue #2547

The Apollo Uplink implementation within Apollo Router, which is used for fetching data from Apollo GraphOS, has been decomposed into a reusable component so that it can be used more generically for fetching artifacts. This generally improved code quality and resulted in several new tests being added.

Additionally, our round-robin fetching behaviour is now more durable. Previously, on failure, there would be a delay before trying the next round-robin URL. Now, all URLs will be tried in sequence until exhausted. If ultimately all URLs fail, then the usual delay is applied before trying again.

By @BrynCooke in #2537

Improve Changelog management through conventions and tooling (PR #2545, PR #2534)

New tooling and conventions adjust our "incoming changelog in the next release" mechanism to no longer rely on a single file, but instead leverage a "file per feature" pattern in conjunction with tooling to create that file.

This stubbing takes place through the use of a new command:

cargo xtask changeset create

For more information on the process, read the README in the ./.changesets directory or consult the referenced Pull Requests below.

By @abernix in #2545 and #2534

v1.10.1

07 Feb 17:16
Compare
Choose a tag to compare

🐛 Fixes

Federation v2.3.1 (Issue #2556)

Update to Federation v2.3.1 to fix subtle bug in @interfaceObject.

By @abernix in #2554

🛠 Maintenance

Redis integration tests (Issue #2174)

We now have integration tests for Redis usage with Automatic Persisted Queries and query planning.

By @Geal in #2179

CI: Enable compliance checks except licenses.html update (Issue #2514)

In #1573, we removed the compliance checks for non-release CI pipelines, because cargo-about output would change ever so slightly on each run.

While many of the checks provided by the compliance check are license related, some checks prevent us from inadvertently downgrading libraries and needing to open, e.g., Issue #2512.

This set of changes includes the following:

  • Introduce cargo xtask licenses to update licenses.html.
  • Separate compliance (cargo-deny, which includes license checks) and licenses generation (cargo-about) in xtask
  • Enable compliance as part of our CI checks for each open PR
  • Update cargo xtask all so it runs tests, checks compliance and updates licenses.html
  • Introduce cargo xtask dev so it checks compliance and runs tests

Going forward, when developing on the Router source:

  • Use cargo xtask all to make sure everything is up to date before a release.
  • Use cargo xtask dev before a PR.

As a last note, updating licenses.html is now driven by cargo xtask licenses, which is part of the release checklist and automated through our release tooling in xtask.

By @o0Ignition0o in #2520

Fix flaky tracing integration test (Issue #2548)

Disable federated-tracing (FTV1) in tests by lowering the sampling rate to zero so that consistent results are generated in test snapshots.

By @bryncooke in #2549

Update to Rust 1.67

We've updated the Minimum Supported Rust Version (MSRV) version to v1.67.

By @SimonSapin in #2496 and #2499

v1.10.0

01 Feb 13:53
4ceffa7
Compare
Choose a tag to compare

🚀 Features

Update to Federation v2.3.0 (Issue #2465, Issue #2485 and Issue #2489)

This brings in Federation v2.3.0 execution support for:

By @abernix and @o0Ignition0o in #2462
By @pcmanus in #2485 and #2489

Always deduplicate variables on subgraph entity fetches (Issue #2387)

Variable deduplication allows the router to reduce the number of entities that are requested from subgraphs if some of them are redundant, and as such reduce the size of subgraph responses. It has been available for a while but was not active by default. This is now always on.

By @Geal in #2445

Add optional Access-Control-Max-Age header to CORS plugin (Issue #2212)

Adds new option called max_age to the existing cors object which will set the value returned in the Access-Control-Max-Age header. As was the case previously, when this value is not set no value is returned.

It can be enabled using our standard time notation, as follows:

cors:
  max_age: 1day

By @osamra-rbi in #2331

Improved support for wildcards in supergraph.path configuration (Issue #2406)

You can now use a wildcard in supergraph endpoint path like this:

supergraph:
  listen: 0.0.0.0:4000
  path: /graph*

In this example, the Router would respond to requests on both /graphql and /graphiql.

By @bnjjj in #2410

🐛 Fixes

Forbid caching PERSISTED_QUERY_NOT_FOUND responses (Issue #2502)

The router now sends a cache-control: private, no-cache, must-revalidate response header to clients, in addition to the existing PERSISTED_QUERY_NOT_FOUND error code on the response which was being sent previously. This expanded behaviour occurs when a persisted query hash could not be found and is important since such responses should not be cached by intermediary proxies/CDNs since the client will need to be able to send the full query directly to the Router on a subsequent request.

By @o0Ignition0o in #2503

Listen on root URL when /* is set in supergraph.path configuration (Issue #2471)

This resolves a regression which occurred in Router 1.8 when using wildcard notation on a path-boundary, as such:

supergraph:
  path: /*

This occurred due to an underlying Axum upgrade and resulted in failure to listen on localhost when a path was absent. We now special case /* to also listen to the URL without a path so you're able to call http://localhost (for example).

By @bnjjj in #2472

Subgraph traffic shaping timeouts now return HTTP 504 status code (Issue #2360 Issue #2400)

There was a regression where timeouts resulted in a HTTP response of 500 Internal Server Error. This is now fixed with a test to guarantee it, the status code is now 504 Gateway Timeout (instead of the previous 408 Request Timeout which, was also incorrect in that it blamed the client).

There is also a new metric emitted called apollo_router_timeout to track when timeouts are triggered.

By @Geal in #2419

Fix panic in schema parse error reporting (Issue #2269)

In order to support introspection, some definitions like type __Field { … } are implicitly added to schemas. This addition was done by string concatenation at the source level. In some cases, like unclosed braces, a parse error could be reported at a position beyond the size of the original source. This would cause a panic because only the unconcatenated string is sent to the error reporting library miette.

Instead, the Router now parses introspection types separately and "concatenates" the definitions at the AST level.

By @SimonSapin in #2448

Always accept compressed subgraph responses (Issue #2415)

Previously, subgraph response decompression was only supported when subgraph request compression was explicitly configured. This is now always active.

By @Geal in #2450

Fix handling of root query operations not named Query

If you'd mapped your default Query type to something other than the default using schema { query: OtherQuery }, some parsing code in the Router would incorrectly return an error because it had previously assumed the default name of Query. The same case would have occurred if the root mutation type was not named Mutation.

This is now corrected and the Router understands the mapping.

By @SimonSapin in #2459

Remove the locations field from subgraph errors (Issue #2297)

Subgraph errors can come with a locations field indicating which part of the query was causing issues, but it refers to the subgraph query generated by the query planner, and we have no way of translating it to locations in the client query. To avoid confusion, we've removed this field from the response until we can provide a more coherent way to map these errors back to the original operation.

By @Geal in #2442

Emit metrics showing number of client connections (issue #2384)

New metrics are available to track the client connections:

  • apollo_router_session_count_total indicates the number of currently connected clients
  • apollo_router_session_count_active indicates the number of in flight GraphQL requests from connected clients.

This also fixes the behaviour when we reach the maximum number of file descriptors: instead of going into a busy loop, the router will wait a bit before accepting a new connection.

By @Geal in #2395

--dev will no longer modify configuration that it does not directly touch (Issue #2404, Issue #2481)

Previously, the Router's --dev mode was operating against the configuration object model. This meant that it would sometimes replace pieces of configuration where it should have merely modified it. Now, --dev mode will override the following properties in the YAML config, but it will leave any adjacent configuration as it was:

homepage:
  enabled: false
include_subgraph_errors:
  all: true
plugins:
  experimental.expose_query_plan: true
sandbox:
  enabled: true
supergraph:
  introspection: true
telemetry:
  tracing:
    experimental_response_trace_id:
      enabled: true

By @BrynCooke in #2489

🛠 Maintenance

Improve #[serde(default)] attribute on structs (Issue #2424)

If all the fields of your struct have their default value then use the #[serde(default)] on the struct instead of on each field. If you have specific default values for a field, you'll have to create your own impl Default for the struct.

Correct approach

#[serde(deny_unknown_fields, default)]
struct Export {
    url: Url,
    enabled: bool
}

impl Default for Export {
  fn default() -> Self {
    Self {
      url: default_url_fn(),
      enabled: false
    }
  }
}

Discouraged approach

#[serde(deny_unknown_fields)]
struct Export {
    #[serde(default="default_url_fn")
    url: Url,
    #[serde(default)]
    enabled: bool
}

By @bnjjj in #2424

📃 Configuration

Configuration changes will be automatically migrated on load. However, you should update your source configuration files as these will become breaking changes in a future major release.

health-check has been renamed to health_check (Issue #2161)

The health_check option in the configuration has been renamed to use snake_case rather than kebab-case for consistency with the other properties in the configuration:

-health-check:
+health_check:
   enabled: true

By @BrynCooke in #2451 and #2463

📚 Documentation

Disabling anonymous usage metrics (Issue #2478)

To disable the anonymous usage metrics, you set APOLLO_TELEMETRY_DISABLED=true in the environment. The documentation previously said to use 1 as the value instead of true. In the future, either will work, so this is primarily a...

Read more

v1.9.0

23 Jan 15:26
8123e8a
Compare
Choose a tag to compare

🚀 Features

Add support for base64::encode() / base64::decode() in Rhai (Issue #2025)

Two new functions, base64::encode() and base64::decode(), have been added to the capabilities available within Rhai scripts to Base64-encode or Base64-decode strings, respectively.

By @garypen in #2394

Override the root TLS certificate list for subgraph requests (Issue #1503)

In some cases, users need to use self-signed certificates or use a custom certificate authority (CA) when communicating with subgraphs.

It is now possible to consigure these certificate-related details using configuration for either specific subgraphs or all subgraphs, as follows:

tls:
  subgraph:
    all:
      certificate_authorities: "${file./path/to/ca.crt}"
    # Use a separate certificate for the `products` subgraph.
    subgraphs:
      products:
        certificate_authorities: "${file./path/to/product_ca.crt}"

The file referenced in the certificate_authorities value is expected to be the combination of several PEM certificates, concatenated together into a single file (as is commonplace with Apache TLS configuration).

These certificates are only configurable via the Router's configuration since using SSL_CERT_FILE would also override certificates for sending telemetry and communicating with Apollo Uplink.

While we do not currently support terminating TLS at the Router (from clients), the tls is located at the root of the configuration file to allow all TLS-related configuration to be semantically grouped together in the future.

Note: If you are attempting to use a self-signed certificate, it must be generated with the proper file extension and with basicConstraints disabled. For example, a v3.ext extension file:

subjectKeyIdentifier   = hash
authorityKeyIdentifier = keyid:always,issuer:always
# this has to be disabled
# basicConstraints       = CA:TRUE
keyUsage               = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyAgreement, keyCertSign
subjectAltName         = DNS:local.apollo.dev
issuerAltName          = issuer:copy

Using this v3.ext file, the certificate can be generated with the appropriate certificate signing request (CSR) - in this example, server.csr - using the following openssl command:

openssl x509 -req -in server.csr -signkey server.key -out server.crt -extfile v3.ext

This will produce the file as server.crt which can be passed as certificate_authorities.

By @Geal in #2008

Measure the Router's processing time (Issue #1949 Issue #2057)

The Router now emits a metric called apollo_router_processing_time which measures the time spent executing the request minus the time spent waiting for an external requests (e.g., subgraph request/response or external plugin request/response). This measurement accounts both for the time spent actually executing the request as well as the time spent waiting for concurrent client requests to be executed. The unit of measurement for the metric is in seconds, as with other time-related metrics the router produces, though this is not meant to indicate in any way that the Router is going to add actual seconds of overhead.

By @Geal in #2371

Automated persisted queries support for subgraph requests (PR #2284)

Automatic persisted queries (APQ) (See useful context in our Apollo Server docs) can now be used for subgraph requests. It is disabled by default, and can be configured for all subgraphs or per subgraph:

supergraph:
  apq:
    subgraph:
      # override for all subgraphs
      all:
        enabled: false
      # override per subgraph
      subgraphs:
        products:
          enabled: true

By @krishna15898 and @Geal in #2284 and #2418

Allow the disabling of automated persisted queries (PR #2386)

Automatic persisted queries (APQ) support is still enabled by default on the client side, but can now be disabled in the configuration:

supergraph:
  apq:
    enabled: false

By @Geal in #2386

Anonymous product usage analytics (Issue #2124, Issue #2397, Issue #2412)

Following up on #1630, the Router transmits anonymous usage telemetry about configurable feature usage which helps guide Router product development. No information is transmitted in our usage collection that includes any request-specific information. Knowing what features and configuration our users are depending on allows us to evaluate opportunities to reduce complexity and remain diligent about the surface area of the Router over time. The privacy of your and your user's data is of critical importance to the core Router team and we handle it with great care in accordance with our privacy policy, which clearly states which data we collect and transmit and offers information on how to opt-out.

Booleans and numeric values are included, however, any strings are represented as <redacted> to avoid leaking confidential or sensitive information.

For example:

{
   "session_id": "fbe09da3-ebdb-4863-8086-feb97464b8d7", // Randomly generated at Router startup.
   "version": "1.4.0", // The version of the router
   "os": "linux",
   "ci": null, // If CI is detected then this will name the CI vendor
   "usage": {
     "configuration.headers.all.request.propagate.named.<redacted>": 3,
     "configuration.headers.all.request.propagate.default.<redacted>": 1,
     "configuration.headers.all.request.len": 3,
     "configuration.headers.subgraphs.<redacted>.request.propagate.named.<redacted>": 2,
     "configuration.headers.subgraphs.<redacted>.request.len": 2,
     "configuration.headers.subgraphs.len": 1,
     "configuration.homepage.enabled.true": 1,
     "args.config-path.redacted": 1,
     "args.hot-reload.true": 1,
     //Many more keys. This is dynamic and will change over time.
     //More...
     //More...
     //More...
   }
 }

Users can disable this mechanism by setting the environment variable APOLLO_TELEMETRY_DISABLED=true in their environment.

By @BrynCooke in #2173, #2398, #2413

🐛 Fixes

Don't send header names to Studio if send_headers is none (Issue #2403)

We no longer transmit header names to Apollo Studio when send_headers is set to none (the default). Previously, when send_headers was set to none (like in the following example) the header names were still transmitted with empty header values. No actual values were ever being sent unless send_headers was sent to a more permissive option like forward_headers_only or forward_headers_except.

telemetry:
  apollo:
    send_headers: none

By @bnjjj in #2425

Response with Content-type: application/json when encountering incompatible Content-type or Accept request headers (Issue #2334)

When receiving requests with content-type and accept header mismatches (e.g., on multipart requests) the Router now utilizes a correct content-type header in its response.

By @Meemaw in #2370

Fix APOLLO_USAGE_REPORTING_INGRESS_URL behavior when Router was run without a configuration file

The environment variable APOLLO_USAGE_REPORTING_INGRESS_URL (not usually necessary under typical operation) was not being applied correctly when the Router was run without a configuration file.
In addition, defaulting of environment variables now directly injects the variable rather than injecting via expansion expression. This means that the use of APOLLO_ROUTER_CONFIG_ENV_PREFIX (even less common) doesn't affect injected configuration defaults.

By @BrynCooke in #2432

🛠 Maintenance

Remove unused factory traits (PR #2372)

We removed a factory trait that was only used in a single implementation, which removes the overall requirement that execution and subgraph building take place via that factory trait.

By @Geal in #2372

Optimize header propagation plugin's regular expression matching (PR #2392)

We've changed the header propagation plugins' behavior to reduce the chance of memory allocations occurring when applying regex-based header propagation rules.

By @o0Ignition0o in #2392

📚 Documentation

Creating custom metrics in plugins (Issue #2294)

To create your custom metrics in Prometheus you can use the tracing macros to generate an event. If you observe a specific naming pattern for your event, you'll be able to...

Read more