Releases · apollographql/router

11 Jan 18:22

v1.8.0

5c33aff

v1.8.0

📃 Configuration

Configuration changes will be automatically migrated on load. However, you should update your source configuration files as these will become breaking changes in a future major release.

Defer support graduates from preview (Issue #2368)

We're pleased to announce that @defer support has been promoted to general availability in accordance with our product launch stages.

Defer is enabled by default in the Router, however if you had previously explicitly disabled defer support via configuration then you will need to update your configuration accordingly:

Before:

supergraph:
  preview_defer_support: true

After:

supergraph:
  defer_support: true

By @BrynCooke in #2378

Remove `timeout` from OTLP exporter (Issue #2337)

A duplicative timeout property has been removed from the telemetry.tracing.otlp object since the batch_processor configuration already contained a timeout property. The Router will tolerate both options for now and this will be a breaking change in a future major release. Please update your configuration accordingly to reduce future work.

Before:

telemetry:
  tracing:
    otlp:
      timeout: 5s

After:

telemetry:
  tracing:
    otlp:
      batch_processor:
        timeout: 5s

By @BrynCooke in #2338

🚀 Features

The Helm chart has graduated from prerelease to general availability (PR #2380)

As part of this release, we have promoted the Helm chart from its prerelease "release-candidate" stage to a "stable" version number. We have chosen to match the version of the Helm chart to the Router version, which is very agreeable with our automated Router releasing pipeline. This means the first stable version of the Helm chart will be 1.8.0 which will pair with Router 1.8.0 and subsequent versions will be in lock-step.

By @abernix in #2380

Emit hit/miss metrics for APQ, Query Planning and Introspection caches (Issue #1985)

Added metrics for caching.
Each cache metric contains a kind attribute to indicate the kind of cache (query planner, apq, introspection)
and a storage attribute to indicate the backing storage e.g memory/disk.

The following buckets are exposed:
apollo_router_cache_hit_count - cache hits.

apollo_router_cache_miss_count - cache misses.

apollo_router_cache_hit_time - cache hit duration.

apollo_router_cache_miss_time - cache miss duration.

Example

# TYPE apollo_router_cache_hit_count counter
apollo_router_cache_hit_count{kind="query planner",new_test="my_version",service_name="apollo-router",storage="memory"} 2
# TYPE apollo_router_cache_hit_time histogram
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.001"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.005"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.015"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.05"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.1"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.2"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.3"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.4"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.5"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="1"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="5"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="10"} 2
apollo_router_cache_hit_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="+Inf"} 2
apollo_router_cache_hit_time_sum{kind="query planner",service_name="apollo-router",storage="memory"} 0.000236782
apollo_router_cache_hit_time_count{kind="query planner",service_name="apollo-router",storage="memory"} 2
# HELP apollo_router_cache_miss_count apollo_router_cache_miss_count
# TYPE apollo_router_cache_miss_count counter
apollo_router_cache_miss_count{kind="query planner",service_name="apollo-router",storage="memory"} 1
# HELP apollo_router_cache_miss_time apollo_router_cache_miss_time
# TYPE apollo_router_cache_miss_time histogram
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.001"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.005"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.015"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.05"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.1"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.2"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.3"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.4"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="0.5"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="1"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="5"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="10"} 1
apollo_router_cache_miss_time_bucket{kind="query planner",service_name="apollo-router",storage="memory",le="+Inf"} 1
apollo_router_cache_miss_time_sum{kind="query planner",service_name="apollo-router",storage="memory"} 0.000186783
apollo_router_cache_miss_time_count{kind="query planner",service_name="apollo-router",storage="memory"} 1

By @bnjjj in #2327

Add support for single instance Redis (Issue #2300)

Experimental caching via Redis now works with single Redis instances when configured with a single URL.

By @bnjjj in #2310

Support TLS connections to single instance Redis (Issue #2332)

TLS connections are now supported when connecting to single Redis instances. It is useful for connecting to hosted Redis providers where TLS is mandatory.
TLS connections for clusters are not supported yet, see Issue #2332 for updates.

By @Geal in #2336

🐛 Fixes

Correctly handle aliased `__typename` fields (Issue #2330)

If you aliased a __typename like in this example query:

{
  myproducts: products {
       total
       __typename
  }
  _0___typename: __typename
}

Before this fix, _0___typename was set to null. Thanks to this fix it now properly returns Query.

By @bnjjj in #2357

`subgraph_request` span is now set as the parent of traces coming from subgraphs (Issue #2344)

Before this fix, the context injected in headers to subgraphs was wrong and not attached to the correct parent span id, causing it to appear disconnected when rendering the trace tree.

By @bnjjj in #2345

🛠 Maintenance

Simplify telemetry config code (Issue #2337)

This brings the telemetry plugin configuration closer to standards recommended in the YAML design guidance.

By @BrynCooke in #2338

Upgrade the `clap` version in scaffold templates (Issue #2165)

Upgrade clap dependency version to a version supporting the generation of scaffolded plugins via xtask.

By @bnjjj in #2343

Upgrade axum to `0.6.1` (PR #2303)

For more details about the new axum release, please read the project's change log

By @bnjjj in #2303

Set the HTTP response `content-type` as `application/json` when returning GraphQL errors (Issue #2320)

When throwing a INVALID_GRAPHQL_REQUEST error, it now specifies the expected content-type header rather than omitting the header as it was prev...

Contributors

Geal, BrynCooke, and 2 other contributors

Assets 9

23 Dec 16:38

apollo-bot2

v1.7.0

239b6b8

v1.7.0

🚀 Features

Newly scaffolded projects now include a `Dockerfile` (Issue #2295)

Custom Router binary projects created using our scaffolding tooling will now have a Dockerfile emitted to facilitate building custom Docker containers.

By @o0Ignition0o in #2307

Apollo Uplink communication timeout is configurable (PR #2271)

The amount of time which can elapse before timing out when communicating with Apollo Uplink is now configurable via the APOLLO_UPLINK_TIMEOUT environment variable and the --apollo-uplink-timeout CLI flag, in a similar fashion to how the interval can be configured. It still defaults to 30 seconds.

By @o0Ignition0o in #2271

Query plan cache is pre-warmed using existing operations when the supergraph changes (Issue #2302, Issue #2308)

A new warmed_up_queries configuration option has been introduced to pre-warm the query plan cache when the supergraph changes.

Under normal operation, query plans are cached to avoid the recomputation cost. However, when the supergraph changes, previously-planned queries must be re-planned to account for implementation changes in the supergraph, even though the query itself may not have changed. Under load, this re-planning can cause performance variations due to the extra computation work. To reduce the impact, it is now possible to pre-warm the query plan cache for the incoming supergraph, prior to changing over to the new supergraph. Pre-warming slightly delays the roll-over to the incoming supergraph, but allows the most-requested operations to not be impacted by the additional computation work.

To enable pre-warming, the following configuration can be introduced which sets warmed_up_queries:

supergraph:
  query_planning:
    # Pre-plan the 100 most used operations when the supergraph changes.  (Default is "0", disabled.)
    warmed_up_queries: 100
    experimental_cache:
      in_memory:
        # Sets the limit of entries in the query plan cache
        limit: 512

Query planning was also updated to finish executing and setting up the cache, even if the response couldn't be returned to the client which is important to avoid throwing away computationally-expensive work.

By @Geal in #2309

🐛 Fixes

Propagate errors across inline fragments (PR #2304)

GraphQL errors are now correctly propagated across inline fragments.

By @o0Ignition0o in #2304

Only rebuild `protos` if `reports.proto` source changes

Apollo Studio accepts traces and metrics from Apollo Router via the Protobuf specification which lives in the reports.proto file in the repository. With this contribution, we only re-build from the reports.proto file when the file has actually changed, as opposed to doing it on every build which was occurring previously. This change saves build time for developers.

By @scottdouglas1989 in #2283

Return an error on duplicate keys in configuration (Issue #1428)

Repeat usage of the same keys in Router YAML can be hard to notice but indicate a misconfiguration which can cause unexpected behavior since only one of the values can be in effect. With this improvement, the following YAML configuration will raise an error at Router startup to alert the user of the misconfiguration:

telemetry:
  tracing:
    propagation:
      jaeger: true
  tracing:
    propagation:
      jaeger: false

In this particular example, the error produced would be:

ERROR duplicated keys detected in your yaml configuration: 'telemetry.tracing'

By @bnjjj in #2270

Return requested `__typename` in initial chunk of a deferred response (Issue #1922)

The special-case __typename field is no longer being treated incorrectly when requested at the root level on an operation which used @defer. For example, the following query:

{
  __typename
  ...deferedFragment @defer
}

fragment deferedFragment on Query {
  slow
}

The Router now exhibits the correct behavior for this query with __typename being returned as soon as possible in the initial chunk, as follows:

{"data":{"__typename": "Query"},"hasNext":true}

By @bnjjj in #2274

Log retriable Apollo Uplink failures at the `debug` level (Issue #2004)

The log levels for messages pertaining to Apollo Uplink schema fetch failures are now emitted at debug level to reduce noise since such failures do not indicate an actual error since they can be and are retried immediately.

By @bnjjj in #2215

Traces won't cause missing field-stats (Issue #2267)

Metrics are now correctly measured comprehensively and traces will obey the trace sampling configuration. Previously, if a request was sampled out of tracing it would not always contribute to metrics correctly. This was particularly problematic for users which had configured high sampling rates for their traces.

By @BrynCooke in #2277 and #2286

Replace default `notify` watcher mechanism with `PollWatcher` (Issue #2245)

We have replaced the default mechanism used by our underlying file-system notification library, notify, to use PollWatcher. This more aggressive change has been taken on account of continued reports of failed hot-reloading and follows up our previous replacement of hotwatch. We don't have very demanding file watching requirements, so while PollWatcher offers less sophisticated functionality and slightly slower reactivity, it is at least consistent on all platforms and should provide the best developer experience.

By @garypen in #2276

Preserve subgraph error's `path` property when redacting subgraph errors (Issue #1818)

The path property in errors is now preserved. Previously, error redaction was removing the error's path property, which made debugging difficult but also made it impossible to correctly match errors from deferred responses to the appropriate fields in the requested operation. Since the response shape for the primary and deferred responses are defined from the client-facing "API schema", rather than the supergraph, this change will not result in leaking internal supergraph implementation details to clients and the result will be consistent, even if the subgraph which provides a particular field changes over time.

By @Geal in #2273

Use correct URL decoding for `variables` in HTTP `GET` requests (Issue #2248)

The correct URL decoding will now be applied when making a GET request that passes in the variables query string parameter. Previously, all '+' characters were being replaced with spaces which broke cases where the + symbol was not merely an encoding symbol (e.g., ISO8601 date time values with timezone information).

By @neominik in #2249

🛠 Maintenance

Return additional details to client for invalid GraphQL requests (Issue #2301)

Additional context will be returned to clients in the error indicating the source of the error when an invalid GraphQL request is made. For example, passing a string instead of an object for the variables property will now inform the client of the mistake, providing a better developer experience:

{
  "errors": [
    {
      "message": "Invalid GraphQL request",
      "extensions": {
        "details": "failed to deserialize the request body into JSON: invalid type: string \"null\", expected a map at line 1 column 100",
        "code": "INVALID_GRAPHQL_REQUEST"
      }
    }
  ]
}

By @bnjjj in #2306

OpenTelemetry spans to subgraphs now include the request URL (Issue #2280)

A new http.url attribute has been attached to subgraph_request OpenTelemetry trace spans which specifies the URL which the particular request was made to.

By @bnjjj in #2292

Errors returned to clients are now more consistently formed (Issue #2101)

We now return errors in a more consistent shape to those which were returned by Apollo Gateway and Apollo Server, and seen in the documentation. In particular, when available, a stable code field will be included in the error's extensions.

By @bnjjj in #2178

🧪 Experimental

Note

These features are subject to change slightly (usually, in terms of naming or interfaces) before graduating to general availability.

[Read mo...

Contributors

garypen, Geal, and 5 other contributors

Assets 9

13 Dec 17:49

apollo-bot2

v1.6.0

26a5d5b

v1.6.0

🚀 Features

Add support for experimental tooling (Issue #2136)

Display a message at startup listing used experimental_ configurations with related GitHub discussions.
It also adds a new cli command router config experimental to display all available experimental configurations.

By @bnjjj in #2242

Re-deploy router pods if the SuperGraph configmap changes (PR #2223)

When setting the supergraph with the supergraphFile variable a sha256 checksum is calculated and set as an annotation for the router pods. This will spin up new pods when the supergraph is mounted via config map and the schema has changed.

Note: It is preferable to not have --hot-reload enabled with this feature since re-configuring the router during a pod restart is duplicating the work and may cause confusion in log messaging.

By @toneill818 in #2223

Tracing batch span processor is now configurable (Issue #2232)

Exporting traces often requires performance tuning based on the throughput of the router, sampling settings and ingestion capability of tracing ingress.

All exporters now support configuring the batch span processor in the router yaml.

telemetry:
  apollo:
    batch_processor:
      scheduled_delay: 100ms
      max_concurrent_exports: 1000
      max_export_batch_size: 10000
      max_export_timeout: 100s
      max_queue_size: 10000
  tracing:
    jaeger|zipkin|otlp|datadog:
      batch_processor:
        scheduled_delay: 100ms
        max_concurrent_exports: 1000
        max_export_batch_size: 10000
        max_export_timeout: 100s
        max_queue_size: 10000

See the Open Telemetry docs for more information.

By @BrynCooke in #1970

Add hot-reload support for Rhai scripts (Issue #1071)

The router will "watch" your "rhai.scripts" directory for changes and prompt an interpreter re-load if changes are detected. Changes are defined as:

creating a new file with a ".rhai" suffix
modifying or removing an existing file with a ".rhai" suffix

The watch is recursive, so files in sub-directories of the "rhai.scripts" directory are also watched.

The Router attempts to identify errors in scripts before applying the changes. If errors are detected, these will be logged and the changes will not be applied to the runtime. Not all classes of error can be reliably detected, so check the log output of your router to make sure that changes have been applied.

By @garypen in #2198

Add support for working with multi-value header keys to Rhai (Issue #2211, Issue #2255)

Adds support for setting a header map key with an array. This causes the HeaderMap key/values to be appended() to the map, rather than inserted().

Adds support for a new values() fn which retrieves multiple values for a HeaderMap key as an array.

Example use from Rhai as:

  response.headers["set-cookie"] = [
    "foo=bar; Domain=localhost; Path=/; Expires=Wed, 04 Jan 2023 17:25:27 GMT; HttpOnly; Secure; SameSite=None",
    "foo2=bar2; Domain=localhost; Path=/; Expires=Wed, 04 Jan 2023 17:25:27 GMT; HttpOnly; Secure; SameSite=None",
  ];
  response.headers.values("set-cookie"); // Returns the array of values

By @garypen in #2219, #2258

🐛 Fixes

Filter nullified deferred responses (Issue #2213)

@defer spec updates mandates that a deferred response should not be sent if its path points to an element of the response that was nullified in a previous payload.

By @Geal in #2184

Return root `__typename` when parts of a query with deferred fragment (Issue #1677)

With this query:

{
  __typename
  fast
  ...deferedFragment @defer
}

fragment deferedFragment on Query {
  slow
}

You will receive the first response chunk:

{"data":{"__typename": "Query", "fast":0},"hasNext":true}

By @bnjjj in #2188

Wait for opentelemetry tracer provider to shutdown (PR #2191)

When we drop Telemetry we spawn a thread to perform the global opentelemetry trace provider shutdown. The documentation of this function indicates that "This will invoke the shutdown method on all span processors. span processors should export remaining spans before return". We should give that process some time to complete (5 seconds currently) before returning from the drop. This will provide more opportunity for spans to be exported.

By @garypen in #2191

Dispatch errors from the primary response to deferred responses (Issue #1818, Issue #2185)

When errors are generated during the primary execution, some may also be assigned to deferred responses.

By @Geal in #2192

Reconstruct deferred queries with knowledge about fragments (Issue #2105)

When we are using @defer, response formatting must apply on a subset of the query (primary or deferred), that is reconstructed from information provided by the query planner: a path into the response and a subselection. Previously, that path did not include information on fragment application, which resulted in query reconstruction issues if @defer was used under a fragment application on an interface.

By @Geal in #2109

🛠 Maintenance

Improve plugin registration predictability (PR #2181)

This replaces ctor with linkme. ctor enables rust code to execute before main. This can be a source of undefined behaviour and we don't need our code to execute before main. linkme provides a registration mechanism that is perfect for this use case, so switching to use it makes the router more predictable, simpler to reason about and with a sound basis for future plugin enhancements.

By @garypen in #2181

it_rate_limit_subgraph_requests fixed (Issue #2213)

This test was failing frequently due to it being a timing test being run in a single threaded tokio runtime.

By @BrynCooke in #2218

Update reports.proto protobuf definition (PR #2247)

Update the reports.proto file, and change the prompt to update the file with the correct new location.

By @o0Ignition0o in #2247

Upgrade OpenTelemetry to 0.18 (Issue #1970)

Update to OpenTelemetry 0.18.

By @bryncooke and @bnjjj in #1970 and #2236

Remove spaceport (Issue #2233)

Removal significantly simplifies telemetry code and likely to increase performance and reliability.

By @bryncooke in #1970

Update to Rust 1.65 (Issue #2220)

Rust MSRV incremented to 1.65.

By @bryncooke in #2221 and #2240

Improve automated release (Pull #2220)

Improved the automated release to:

Update the scaffold files
Improve the names of prepare release steps in circle.

By @bryncooke in #2256

Use Elastic-2.0 license spdx (PR #2055)

Now that the Elastic-2.0 spdx is a valid identifier in the rust ecosystem, we can update the router references.

By @o0Ignition0o in #2054

Configuration

Protoc now required to build from source (Issue #1970)

Protoc is now required to build Apollo Router. Upgrading to Open Telemetry 0.18 has enabled us to upgrade tonic which in turn no longer bundles protoc.
Users must install it themselves https://grpc.io/docs/protoc-installation/.

By @bryncooke in #1970

Jaeger scheduled_delay moved to batch_processor->scheduled_delay ([Issue #2232](https://github.com/apollographql/router/issu...

Contributors

garypen, Geal, and 3 other contributors

Assets 9

06 Dec 10:58

apollo-bot2

v1.5.0

ff9bfba

v1.5.0

🚀 Features

Add configuration for trace ID (Issue #2080)

Trace ids can be propagated directly from a request header:

telemetry:
  tracing:
    propagation:
      # If you have your own way to generate a trace id and you want to pass it via a custom request header
      request:
        header_name: my-trace-id

In addition, trace id can be exposed via a response header:

telemetry:
  tracing:
    experimental_response_trace_id:
      enabled: true # default: false
      header_name: "my-trace-id" # default: "apollo-trace-id"

Using this configuration you will have a response header called my-trace-id containing the trace ID. It could help you to debug a specific query if you want to grep your log with this trace id to have more context.

By @bnjjj in #2131

Add configuration for logging and add more logs (Issue #1998)

By default, logs do not contain request body, response body or headers.
It is now possible to conditionally add this information for debugging and audit purposes.
Here is an example how you can configure it:

telemetry:
  experimental_logging:
    format: json # By default it's "pretty" if you are in an interactive shell session
    display_filename: true # Display filename where the log is coming from. Default: true
    display_line_number: false # Display line number in the file where the log is coming from. Default: true
    # If one of these headers matches we will log supergraph and subgraphs requests/responses
    when_header:
      - name: apollo-router-log-request
        value: my_client
        headers: true # default: false
        body: true # default: false
      # log request for all requests/responses headers coming from Iphones
      - name: user-agent
        match: ^Mozilla/5.0 (iPhone*
        headers: true

By @bnjjj in #2040

Provide multi-arch (amd64/arm64) Docker images for the Router (Issue #1932)

From 1.5.0 our Docker images will be multi-arch.

By @garypen in #2138

Add a supergraph configmap option to the helm chart (PR #2119)

Adds the capability to create a configmap containing your supergraph schema. Here's an example of how you could make use of this from your values.yaml and with the helm install command.

extraEnvVars:
  - name: APOLLO_ROUTER_SUPERGRAPH_PATH
    value: /data/supergraph-schema.graphql

extraVolumeMounts:
  - name: supergraph-schema
    mountPath: /data
    readOnly: true

extraVolumes:
  - name: supergraph-schema
    configMap:
      name: "{{ .Release.Name }}-supergraph"
      items:
        - key: supergraph-schema.graphql
          path: supergraph-schema.graphql

With that values.yaml content, and with your supergraph schema in a file name supergraph-schema.graphql, you can execute:

helm upgrade --install --create-namespace --namespace router-test --set-file supergraphFile=supergraph-schema.graphql router-test oci://ghcr.io/apollographql/helm-charts/router --version 1.0.0-rc.9 --values values.yaml

By @garypen in #2119

Configuration upgrades (Issue #2123)

Occasionally we will make changes to the Router yaml configuration format.
When starting the Router, if the configuration can be upgraded, it will do so automatically and display a warning:

2022-11-22T14:01:46.884897Z  WARN router configuration contains deprecated options: 

  1. telemetry.tracing.trace_config.attributes.router has been renamed to 'supergraph' for consistency

These will become errors in the future. Run `router config upgrade <path_to_router.yaml>` to see a suggested upgraded configuration.

Note: If a configuration has errors after upgrading then the configuration will not be upgraded automatically.

From the CLI users can run:

router config upgrade <path_to_router.yaml> to output configuration that has been upgraded to match the latest config format.
router config upgrade --diff <path_to_router.yaml> to output a diff e.g.

 telemetry:
   apollo:
     client_name_header: apollographql-client-name
   metrics:
     common:
       attributes:
-        router:
+        supergraph:
           request:
             header:
             - named: "1" # foo

There are situations where comments and whitespace are not preserved.

By @bryncooke in #2116, #2162

Experimental 🥼 subgraph request retry (Issue #338, Issue #1956)

Implements subgraph request retries, using Finagle's retry buckets algorithm:

it defines a minimal number of retries per second (min_per_sec, default is 10 retries per second), to
bootstrap the system or for low traffic deployments
for each successful request, we add a "token" to the bucket, those tokens expire after ttl (default: 10 seconds)
the number of available additional retries is a part of the number of tokens, defined by retry_percent (default is 0.2)

Request retries are disabled by default on mutations.

This is activated in the traffic_shaping plugin, either globally or per subgraph:

traffic_shaping:
  all:
    experimental_retry:
      min_per_sec: 10
      ttl: 10s
      retry_percent: 0.2
      retry_mutations: false
  subgraphs:
    accounts:
      experimental_retry:
        min_per_sec: 20

By @Geal in #2006 and #2160

Experimental 🥼 Caching configuration (Issue #2075)

Split Redis cache configuration for APQ and query planning:

supergraph:
  apq:
    experimental_cache:
      in_memory:
        limit: 512
      redis:
        urls: ["redis://..."]
  query_planning:
    experimental_cache:
      in_memory:
        limit: 512
      redis:
        urls: ["redis://..."]

By @Geal in #2155

`@defer` Apollo tracing support (Issue #1600)

Added Apollo tracing support for queries that use @defer. You can now view traces in Apollo Studio as normal.

By @bryncooke in #2190

🐛 Fixes

Router debug Docker images now run under the control of heaptrack (Issue #2135)

From 1.5.0, our debug Docker image will invoke the router under the control of heaptrack. We are making this change to make it simple for users to investigate potential memory issues with the Router.

Do not run debug images in performance sensitive contexts. The tracking of memory allocations will significantly impact performance. In general, the debug image should only be used in consultation with Apollo engineering and support.

Look at our documentation for examples of how to use the image in either Docker or Kubernetes.

By @garypen in #2142

Fix panic when dev mode enabled with empty config file (Issue #2182)

If you're running the Router with dev mode with an empty config file, it will no longer panic

By @bnjjj in #2195

Fix missing apollo tracing variables (Issue #2186)

Send variable values had no effect. This is now fixed.

telemetry:
  apollo:
    send_variable_values: all

By @bryncooke in #2190

fix build_docker_image.sh script when using default repo (PR #2163)

Adding the -r flag recently broke the existing functionality to build from the default repo using -b. This fixes that.

By @garypen in #2163

Improve errors when subgraph returns non-GraphQL response with a non-2xx status code (Issue #2117)

The error response will now contain the status code and status name. Example: HTTP fetch failed from 'my-service': 401 Unauthorized

By @col in #2118

handle mutations containing `@defer` (Issue #2099)

The Router generates partial query shapes corresponding to the primary and deferred responses,
to validate the data sent back to the client. Those query shapes were invalid for mutations.

By @Geal in #2102

Experimental 🥼 APQ and query planner Redis caching fixes (PR #2176)

use a null byte as separator in Redis keys
handle Redis c...

Assets 9

15 Nov 15:56

apollo-bot2

v1.4.0

518efbe

v1.4.0

🚀 Features

Add support for returning different HTTP status codes in Rhai (Issue #2023)

It is now possible to return different HTTP status codes when raising an exception in Rhai. You do this by providing an object map with two keys: status and message, rather than merely a string as was the case previously.

throw #{
    status: 403,
    message: "I have raised a 403"
};

This example will short-circuit request/response processing and return with an HTTP status code of 403 to the client and also set the error message accordingly.

It is still possible to return errors using the current pattern, which will continue to return HTTP status code 500 as previously:

throw "I have raised an error";

It is not currently possible to return a 200 status code using this pattern. If you try, it will be implicitly converted into a 500 error.

By @garypen in #2097

Add support for `urlencode()` / `decode()` in Rhai (Issue #2052)

Two new functions, urlencode() and urldecode() may now be used to URL-encode or URL-decode strings, respectively.

By @garypen in #2053

Experimental 🥼 External cache storage in Redis (PR #2024)

We are experimenting with introducing external storage for caches in the Router, which will provide a foundation for caching things like automated persisted queries (APQ) amongst other future-looking ideas. Our initial implementation supports a multi-level cache hierarchy, first attempting an in-memory LRU-cache, proceeded by a Redis Cluster backend.

As this is still experimental, it is only available as an opt-in through a Cargo feature-flag.

By @garypen and @Geal in #2024

Expose `query_plan` to `ExecutionRequest` in Rhai (PR #2081)

You can now read the query-plan from an execution request by accessing request.query_plan. Additionally, request.context also now supports the Rhai in keyword.

By @garypen in #2081

🐛 Fixes

Move error messages about nullifying into `extensions` (Issue #2071)

The Router was previously creating and returning error messages in errors when nullability rules had been triggered (e.g., when a non-nullable field was null, it nullifies the parent object). These are now emitted into a valueCompletion portion of the extensions response.

Adding those messages in the list of errors was potentially redundant and resulted in failures by clients (such as the Apollo Client error policy, by default) which would otherwise have expected nullified fields as part of normal operation execution. Additionally, the subgraph could already add such an error message indicating why a field was null which would cause the error to be doubled.

By @Geal in #2077

Fix `Float` input-type coercion for default values with values larger than 32-bit (Issue #2087)

A regression has been fixed which caused the Router to reject integers larger than 32-bits used as the default values on Float fields in input types.

In other words, the following will once again work as expected:

input MyInputType {
    a_float_input: Float = 9876543210
}

By @o0Ignition0o in #2090

Assume `Accept: application/json` when no `Accept` header is present Issue #1990)

The Accept header means */* when it is absent, and despite efforts to fix this previously, we still were not always doing the correct thing.

By @bnjjj in #2078

`@skip` and `@include` implementation for root-level fragment use (Issue #2072)

The @skip and @include directives are now implemented for both inline fragments and fragment spreads at the top-level of operations.

By @Geal in #2096

🛠 Maintenance

Use `debian:bullseye-slim` as our base Docker image (PR #2085)

A while ago, when we added compression support to the router, we discovered that the Distroless base-images we were using didn't ship with a copy of libz.so.1. We addressed that problem by copying in a version of the library from the Distroless image (Java) which does ship it. While that worked, we found challenges in adding support for both aarch64 and amd64 Docker images that would make it less than ideal to continue using those Distroless images.

Rather than persist with this complexity, we've concluded that it would be better to just use a base image which ships with libz.so.1, hence the change to debian:bullseye-slim. Those images are still quite minimal and the resulting images are similar in size.

By @garypen in #2085

Update `apollo-parser` to `v0.3.2` (PR #2103)

This updates our dependency on our apollo-parser package which brings a few improvements, including more defensive parsing of some operations. See its CHANGELOG in the apollo-rs repository for more details.

By @abernix in #2103

📚 Documentation

Fix example `helm show values` command (PR #2088)

The helm show vaues command needs to use the correct Helm chart reference oci://ghcr.io/apollographql/helm-charts/router.

By @col in #2088

Contributors

garypen, Geal, and 4 other contributors

Assets 9

09 Nov 12:41

apollo-bot2

v1.3.0

bd9035f

v1.3.0

🚀 Features

Add support for DHAT-based heap profiling (PR #1829)

The dhat-rs crate provides DHAT-style heap profiling. We have added two compile-time features, dhat-heap and dhat-ad-hoc, which leverage this ability.

By @garypen in #1829

Add `trace_id` in logs to correlate entries from the same request (Issue #1981)

A trace_id is now added to each log line to help correlate log entries to specific requests. The value for this property will be automatically inherited from any enabled distributed tracing headers, such as those listed in our Tracing propagation header documentation (e.g., Jaeger, Zipkin, Datadog, etc.).

In the event that a trace_id was not inherited from a propagated header, the Router will originate a trace_id and also propagate that value to subgraphs to enable tracing in subgraphs.

Here is an example of the trace_id appearing in plain-text log output:

2022-10-21T15:17:45.562553Z ERROR [trace_id=5e6a6bda8d0dca26e5aec14dafa6d96f] apollo_router::services::subgraph_service: fetch_error="hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 111, kind: ConnectionRefused, message: \"Connection refused\" }))"
2022-10-21T15:17:45.565768Z ERROR [trace_id=5e6a6bda8d0dca26e5aec14dafa6d96f] apollo_router::query_planner::execution: Fetch error: HTTP fetch failed from 'accounts': HTTP fetch failed from 'accounts': error trying to connect: tcp connect error: Connection refused (os error 111)

And an exmaple of the trace_id appearing in JSON-formatted log output in a similar scenario:

{"timestamp":"2022-10-26T15:39:01.078260Z","level":"ERROR","fetch_error":"hyper::Error(Connect, ConnectError(\"tcp connect error\", Os { code: 111, kind: ConnectionRefused, message: \"Connection refused\" }))","target":"apollo_router::services::subgraph_service","filename":"apollo-router/src/services/subgraph_service.rs","line_number":182,"span":{"name":"subgraph"},"spans":[{"trace_id":"5e6a6bda8d0dca26e5aec14dafa6d96f","name":"request"},{"name":"supergraph"},{"name":"execution"},{"name":"parallel"},{"name":"fetch"},{"name":"subgraph"}]}
{"timestamp":"2022-10-26T15:39:01.080259Z","level":"ERROR","message":"Fetch error: HTTP fetch failed from 'accounts': HTTP fetch failed from 'accounts': error trying to connect: tcp connect error: Connection refused (os error 111)","target":"apollo_router::query_planner::execution","filename":"apollo-router/src/query_planner/execution.rs","line_number":188,"span":{"name":"parallel"},"spans":[{"trace_id":"5e6a6bda8d0dca26e5aec14dafa6d96f","name":"request"},{"name":"supergraph"},{"name":"execution"},{"name":"parallel"}]}

By @bnjjj in #1982

Reload configuration when receiving the SIGHUP signal (Issue #35)

The Router will now reload its configuration when receiving the SIGHUP signal. This signal is only supported on *nix platforms,
and only when a configuration file was passed to the Router initially at startup.

By @Geal in #2015

🐛 Fixes

Fix the deduplication logic in deduplication caching (Issue #1984)

Under load, we found it was possible to break the router de-duplication logic and leave orphaned entries in the waiter map. This fixes the de-duplication logic to prevent this from occurring.

By @garypen in #2014

Follow back-off instructions from Studio Uplink (Issue #1494 Issue #1539)

When operating in a Managed Federation configuration and fetching the supergraph from Apollo Uplink, the Router will now react differently depending on the response from Apollo Uplink, rather than retrying incessantly:

Not attempt to retry when met with unrecoverable conditions (e.g., a Graph that does not exist).
Back-off on retries when the infrastructure asks for a longer retry interval.

By @Geal in #2001

Fix the rhai SDL `print` function (Issue #2005)

Fixes the print function exposed to rhai which was broken due to a recent change that was made in the way we pass SDL (schema definition language) to plugins.

By @fernando-apollo in #2007

Export `router_factory::Endpoint` (PR #2007)

We now export the router_factory::Endpoint struct that was inadvertently unexposed. Without access to this struct, it was not possible to implement the web_endpoints trait in plugins.

By @scottdouglas1989 in #2007

Validate default values for input object fields (Issue #1979)

When validating variables, the Router now uses graph-specified default values for object fields, if applicable.

By @Geal in #2003

Address regression when sending gRPC to `localhost` (Issue #2036)

We again support sending unencrypted gRPC tracing and metrics data to localhost. This follows-up on a regression which occurred in the previous release which addressed a limitation which prevented sending gRPC to TLS-secured endpoints.

Applying a proper fix was complicated by an upstream issue (opentelemetry-rust#908) which incorrectly assumes https in the absence of a more-specific protocol/schema, contrary to the OpenTelmetry specification which indicates otherwise.

The Router will now detect and work-around this upstream issue by explicitly setting the full, correct endpoint URLs when not specified in config.

In addition:

Basic TLS-encyrption will be enabled when the endpoint scheme is explicitly https.
A warning will be emitted if the endpoint port is 443 but no TLS config is specified since most traffic on port 443 is expected to be encrypted.

By @BrynCooke in https://github.com/apollographql/router/pull/#2048

🛠 Maintenance

Apply Tower best-practice to "inner" Service cloning (PR #2030)

We found our Service readiness checks could be improved by following the Tower project's recommendations for cloning inner Services.

By @garypen in #2030

Split the configuration file implementation into modules (Issue #1790)

The internals of the implementation for the configuration have been modularized to facilitate on-going development. There should be no impact to end-users who are only using YAML to configure their Router.

By @Geal in #1996

Apply traffic-shaping directly to `supergraph` and `subgraph` (PR #2034)

The plugin infrastructure works on BoxService instances and makes no guarantee on plugin ordering. The traffic shaping plugin needs a clonable inner service, and should run right before calling the underlying service. We'e changed the traffic plugin application so it can work directly on the underlying service. The configuration remains the same since this is still implemented as a plugin.

By @Geal in #2034

📚 Documentation

Remove references to Git submodules from `DEVELOPMENT.md` (Issue #2012)

We've removed the instructions from our development documentation which guide users to familiarize themselves with and clone Git submodules when working on the Router source itself. This follows-up on the removal of the modules themselves in PR #1856.

By @garypen in #2045

Contributors

garypen, Geal, and 4 other contributors

Assets 9

25 Oct 16:09

apollo-bot2

v1.2.1

b6359aa

v1.2.1

🐛 Fixes

Update to Federation v2.1.4 (PR #1994)

In addition to general Federation bug-fixes, this update should resolve a case (seen in Issue #1962) where a @defer directives which had been previously present in a Supergraph were causing a startup failure in the Router when we were trying to generate an API schema in the Router with @defer.

By @abernix in #1994

Assume `Accept: application/json` when no `Accept` header is present (Issue #1995)

the Accept header means */* when it is absent.

By @Geal in #1995

Fix OpenTelemetry OTLP gRPC (Issue #1976)

OpenTelemetry (OTLP) gRPC failures involving TLS errors have been resolved against external APMs: including Datadog, NewRelic and Honeycomb.io.

By @BrynCooke in https://github.com/apollographql/router/pull/#1977

Prefix the Prometheus metrics with `apollo_router_` (Issue #1915)

Correctly prefix Prometheus metrics with the apollo_router prefix, per convention.

- http_requests_error_total{message="cannot contact the subgraph",service_name="apollo-router",subgraph="my_subgraph_name_error",subgraph_error_extended_type="SubrequestHttpError"} 1
+ apollo_router_http_requests_error_total{message="cannot contact the subgraph",service_name="apollo-router",subgraph="my_subgraph_name_error",subgraph_error_extended_type="SubrequestHttpError"} 1

By @bnjjj in #1971 & #1987

Fix `--hot-reload` in Kubernetes and Docker (Issue #1476)

The --hot-reload flag now chooses a file event notification mechanism at runtime. The exact mechanism is determined by the notify crate.

By @garypen in #1964

Fix a coercion rule that failed to validate 64-bit integers (PR #1951)

Queries that passed 64-bit integers for Float input variables would were failing to validate despite being valid.

By @o0Ignition0o in #1951

Prometheus: make sure `apollo_router_http_requests_error_total` and `apollo_router_http_requests_total` are incremented. (PR #1953)

This affected two different metrics differently:

The apollo_router_http_requests_error_total metric only incremented for requests that would be an INTERNAL_SERVER_ERROR in the Router (the service stack returning a BoxError). This meant that GraphQL validation errors were not increment this counter.
The apollo_router_http_requests_total metric would only increment for successful requests despite the fact that the Prometheus documentation suggests this should be incremented regardless of whether the request succeeded or not.

This PR makes sure we always increment apollo_router_http_requests_total and we increment apollo_router_http_requests_error_total when the status code is 4xx or 5xx.

By @o0Ignition0o in #1953

Set `no_delay` and `keepalive` on subgraph requests Issue #1905)

This re-introduces these parameters which were incorrectly removed in a previous pull request.

By @Geal in #1910

🛠 Maintenance

Improve the stability of some flaky tests (PR #1972)

The trace and rate limiting tests have been sporadically failing in our CI environment. The root cause was a race-condition in the tests so the tests have been made more resilient to reduce the number of failures.

By @garypen in #1972 and #1974

Update `docker-compose` and `Dockerfile`s now that the submodules have been removed (PR #1950)

We recently removed Git submodules from this repository but we didn't update various docker-compose.yml files.

This PR adds new Dockerfiles and updates existing docker-compose.yml files so we can run integration tests (and the fuzzer) without needing to git clone and set up the Federation and federation-demo repositories.

By @o0Ignition0o in #1950

Fix logic around `Accept` headers and multipart responses (PR #1923)

If the Accept header contained multipart/mixed, even with other alternatives like application/json,
a query with a single response was still sent as multipart, which made Apollo Studio Explorer fail on the initial introspection query.

This changes the logic so that:

If the client has indicated an accept of application/json or */* and there is a single response, it will be delivered as content-type: application/json.
If there are multiple responses or the client only accepts multipart/mixed, we will send content-type: multipart/mixed response. This will occur even if there is only one response.
Otherwise, we will return an HTTP status code of 406 Not Acceptable.

By @Geal in #1923

`@defer`: duplicated errors across incremental items (Issue #1834, Issue #1818)

If a deferred response contains incremental responses, the errors should be dispatched in each increment according to the error's path.

By @Geal in #1892

Our Docker images are now linked to our GitHub repository per OCI-standards (PR #1958)

The org.opencontainers.image.source annotation has been added to our Dockerfiles and published Docker image in order to map the published image to our GitHub repository.

By @ndthanhdev in #1958

Contributors

garypen, Geal, and 5 other contributors

Assets 9

11 Oct 15:06

apollo-bot2

v1.2.0

afc9fee

v1.2.0

❗ BREAKING ❗

Note the breaking change is not for the Router itself, but for the Router helm chart which is still 1.0.0-rc.5

Remove support for `rhai.input_file` from the helm chart (Issue #1826)

The existing rhai.input_file mechanism doesn't really work for most helm use cases. This PR removes this mechanism and and encourages the use of the extraVolumes/extraVolumeMounts mechanism with rhai.

Example: Create a configmap which contains your rhai scripts.

apiVersion: v1
kind: ConfigMap
metadata:
  name: rhai-config
  labels:
    app.kubernetes.io/name: rhai-config
    app.kubernetes.io/instance: rhai-config
data:
  main.rhai: |
    // Call map_request with our service and pass in a string with the name
    // of the function to callback
    fn subgraph_service(service, subgraph) {
        print(`registering request callback for ${subgraph}`);
        const request_callback = Fn("process_request");
        service.map_request(request_callback);
    }
  
    // This will convert all cookie pairs into headers.
    // If you only wish to convert certain cookies, you
    // can add logic to modify the processing.
    fn process_request(request) {
  
        // Find our cookies
        if "cookie" in request.headers {
            print("adding cookies as headers");
            let cookies = request.headers["cookie"].split(';');
            for cookie in cookies {
                // Split our cookies into name and value
                let k_v = cookie.split('=', 2);
                if k_v.len() == 2 {
                    // trim off any whitespace
                    k_v[0].trim();
                    k_v[1].trim();
                    // update our headers
                    // Note: we must update subgraph.headers, since we are
                    // setting a header in our sub graph request
                    request.subgraph.headers[k_v[0]] = k_v[1];
                }
            }
        } else {
            print("no cookies in request");
        }
    }
  my-module.rhai: |
    fn process_request(request) {
        print("processing a request");
    }

Note how the data represents multiple rhai source files. The module code isn't used, it's just there to illustrate multiple files in a single configmap.

With that configmap in place, the helm chart can be used with a values file that contains:

router:
  configuration:
    rhai:
      scripts: /dist/rhai
      main: main.rhai
extraVolumeMounts:
  - name: rhai-volume
    mountPath: /dist/rhai
    readonly: true
extraVolumes:
  - name: rhai-volume
    configMap:
      name: rhai-config

The configuration tells the router to load the rhai script main.rhai from the directory /dist/rhai (and load any imported modules from /dist/rhai)

This will mount the confimap created above in the /dist/rhai directory with two files:

main.rhai
my-module.rhai

By @garypen in #1917

🚀 Features

Expose the TraceId functionality to rhai (Issue #1935)

A new function, traceid(), is exposed to rhai scripts which may be used to retrieve a unique trace id for a request. The trace id is an opentelemetry span id.

fn supergraph_service(service) {
    try {
        let id = traceid();
        print(`id: ${id}`);
    }
    catch(err)
    {
        // log any errors
        log_error(`span id error: ${err}`);
    }
}

By @garypen in #1937

🐛 Fixes

Fix studio reporting failures (Issue #1903)

The root cause of the issue was letting the server component of spaceport close silently during a re-configuration or schema reload. This fixes the issue by keeping the server component alive as long as the client remains connected.

Additionally, recycled spaceport connections are now re-connected to spaceport to further ensure connection validity.

Also make deadpool sizing constant across environments (#1893)

By @garypen in #1928

Update `apollo-parser` to v0.2.12 (PR #1921)

Correctly lexes and creates an error token for unterminated GraphQL StringValues with unicode and line terminator characters.

By @lrlna in #1921

`traffic_shaping.all.deduplicate_query` was not correctly set (PR #1901)

Due to a change in our traffic_shaping configuration the deduplicate_query field for all subgraph wasn't set correctly.

By @bnjjj in #1901

🛠 Maintenance

Fix hpa yaml for appropriate kubernetes versions (#1908)

Correct schema for autoscaling/v2beta2 and autoscaling/v2 api versions of the
HorizontalPodAutoscaler within the helm chart

By @damienpontifex in #1914

Contributors

garypen, damienpontifex, and 2 other contributors

Assets 9

30 Sep 11:31

apollo-bot2

v1.1.0

d184fdf

v1.1.0

🚀 Features

Build, test and publish binaries for `aarch64-unknown-linux-gnu` architecture (Issue #1192)

We're now testing and building aarch64-unknown-linux-gnu binaries in our release pipeline and publishing those build artifacts as releases. These will be installable in the same way as our existing installation instructions.

By @EverlastingBugstopper in #1907

Add ability to specify repository location in "DIY" Docker builds (PR #1904)

The new -r flag allows a developer to specify the location of a repository when building a diy docker image. Handy for developers with local repositories.

By @garypen in #1904

Support `serviceMonitor` in Helm chart

kube-prometheus-stack ignores scrape annotations, so a serviceMonitor Custom Resource Definition (CRD) is required to scrape a given target to avoid scrape_configs.

By @hobbsh in #1853

Add support for dynamic header injection (Issue #1755)

The following are now possible in our YAML configuration for headers:

Insert static header

headers:
  all: # Header rules for all subgraphs
    request:
    - insert:
        name: "sent-from-our-apollo-router"
        value: "indeed"

Insert header from context

headers:
  all: # Header rules for all subgraphs
    request:
    - insert:
        name: "sent-from-our-apollo-router-context"
        from_context: "my_key_in_context"

Insert header from request body

headers:
  all: # Header rules for all subgraphs
    request:
    - insert:
        name: "sent-from-our-apollo-router-request-body"
        path: ".operationName" # It's a JSON path query to fetch the operation name from request body
        default: "UNKNOWN" # If no operationName has been specified

By @bnjjj in #1833

🐛 Fixes

Fix external secret support in our Helm chart (Issue #1750)

If an external secret is specified, e.g.:

helm install --set router.managedFederation.existingSecret="my-secret-name" <etc...>

...then the router should be deployed and configured to use the existing secret.

By @garypen in #1878

Do not erase errors when missing `_entities` (Issue #1863)

In a federated query, if the subgraph returned a response with errors and a null or absent data field, the Router was ignoring the subgraph error and instead returning an error complaining about the missing _entities field.

The Router will now aggregate the subgraph error and the missing _entities error.

By @Geal in #1870

Fix Prometheus annotation and healthcheck default

The Prometheus annotation is breaking on a helm upgrade so this fixes the template and also sets defaults. Additionally, defaults are now set for health-check's listen to be 0.0.0.0:8088 within the Helm chart.

By @hobbsh in #1883

Move response formatting to the execution service (PR #1771)

The response formatting process (in which response data is filtered according to deferred responses subselections and the API schema) was being executed in the supergraph service. This was a bit late since it resulted in the execution service returning a stream of invalid responses leading to the execution plugins operating on invalid data.

By @Geal in #1771

Hide footer from "homepage" landing page (PR #1900)

Hides some incorrect language about customization on the landing page. Currently to customize the landing page it requires additional support.

By @glasser in #1900

🛠 Maintenance

Update to Federation 2.1.3 (Issue #1880)

This brings in Federation 2.1.3 to bring in updates to @apollo/federation via the relevant bump in router-bridge.

By @abernix in #1806

Update `reqwest` dependency to resolve DNS resolution failures (Issue #1899)

This should resolve intermittent failures to resolve DNS in Uplink which were occurring due to an upstream bug in the reqwest library.

By @abernix in #1806

Remove span details from log records (PR #1896)

Prior to this change, span details were written to log files. This was unwieldy and contributed to log bloat. Spans and logs are still linked in trace aggregators, such as jaeger, and this change simply affects the content of the written to the console output.

By @garypen in #1896

Change span attribute names in OpenTelemetry to be more consistent (PR #1876)

The span attributes in our OpenTelemetry tracing spans are corrected to be consistently namespaced with attributes that are compliant with the OpenTelemetry specification.

By @bnjjj in #1876

Have CI use rust-toolchain.toml and not install another redudant toolchain (Issue #1313)

Avoids redundant work in CI and makes the YAML configuration less mis-leading.

By @garypen in #1877

Query plan execution refactoring (PR #1843)

This splits the query plan execution in multiple modules to make the code more manageable.

By @Geal in #1843

Remove `Buffer` from APQ (PR #1641)

This removes tower::Buffer usage from the Automated Persisted Queries (APQ) implementation to improve reliability.

By @Geal in #1641

Remove `Buffer` from query deduplication (PR #1889)

This removes tower::Buffer usage from the query deduplication implementation to improve reliability.

By @Geal in #1889

Set MSRV to 1.63.0 (PR #1886)

We compile and test with 1.63.0 on CI at the moment, so it is our de-facto Minimum Supported Rust Version (MSRV).

Setting rust-version in Cargo.toml provides a more helpful error message when using an older version rather than unexpected compilation errors.

By @SimonSapin in #1886

Assets 9

22 Sep 13:09

apollo-bot2

v1.0.0

e81b96b

v1.0.0

Note

🤸 We've reached our initial v1.0.0 release. This project adheres to Semantic Versioning v2.0.0 and our version numbers follow the practices outlined in that specification. If you're updating from 1.0.0-rc.2 there is one breaking change to the API that is unlikely to affect you.

The migration steps from each pre-1.0 version will vary depending on which release you're coming from. To update from previous versions, you can consult the Release Notes for whichever version you are running and work your way to v1.0.0.

Our documentation has been updated to match our current v1.x state. In general, if you run the Router with your existing configuration, you should receive output indicating any values which are no longer valid and find their v1.0.0 equivalent in the updated documentation, or by searching the CHANGELOG.md for the prior configuration option to find when it changed.

Lastly, thank you for all of your positive and constructive feedback in our pre-1.0 stages. If you encounter any questions or feedback while updating to v1.0.0, please search for or open a GitHub Discussion or file a GitHub Issue if you find something working differently than it's documented.

We're excited about the path ahead! 👐

❗ BREAKING ❗

Removed `Request::from_bytes()` from public API (Issue #1855)

We've removed Request::from_bytes() from the public API. We were no longer using it internally and we hardly expect anyone external to have been relying on it so it was worth the remaining breaking change prior to v1.0.0.

We discovered this function during an exercise of documenting our entire public API. While we considered keeping it, it didn't necessarily meet our requirements for shipping it in the public API. It's internal usage was removed in [d147f97d](d147f97d as part of PR #429.

We're happy to consider re-introducing this in the future (it even has a matching Response::from_bytes() which it composes against nicely!), but we thought it was best to remove it for the time-being.

By @abernix in #1858

🚀 Features

Reintroduce health check (Issue #1861)

We have re-introduced a health check at the /health endpoint on a dedicated port that is not exposed on the default GraphQL execution port (4000) but instead on port 8088. We recommend updating from the previous health-point suggestion by consulting our health check configuration documentation. This health check endpoint will act as an "overall" health check for the Router and we intend to add separate "liveliness" and "readiness" checks on their own dedicated endpoints (e.g., /health/live and /health/ready) in the future. At that time, this root /health check will aggregate all other health checks to provide an overall health status however, today, it is simply a "liveliness" check and we have not defined "readiness". We also intend to use port 8088 for other ("internal") functionality in the future, keeping the GraphQL execution endpoint dedicated to serving external client requests.

As for some additional context as to why we've brought it back so quickly: We had previously removed the health check we had been offering in PR #1766 because we wanted to do some additional configurationd design and lean into a new "admin port" (8088). As a temporary solution, we offered the instruction to send a GET query to the Router with a GraphQL query. After some new learnings and feedback, we've had to re-visit that conversation earlier than we expected!

Due to default CSRF protections enabled in the Router, GET requests need to be accompanied by certain HTTP headers in order to disqualify them as being CORS-preflightable requests. While sending the additional header was reasonable straightforward in Kubernetes, other environments (including Google Kubernetes Engine's managed load balancers) didn't offer the ability to send those necessary HTTP headers along with their GET queries. So, the /health endpoint is back.

The health check endpoint is now exposed on 127.0.0.1:8088/health by default, and its listen socket address can be changed in the YAML configuration:

health-check:
  listen: 127.0.0.1:8088 # default
  enabled: true # default

The previous health-check suggestion (i.e., GET /?query={__typename}) will still work, so long as your infrastructure supports sending custom HTTP headers with HTTP requests. Again though, we recommend updating to the new health check.

By @o0Ignition0o and @BrynCooke in #1859

🐛 Fixes

Remove `apollo_private` and OpenTelemetry entries from logs (Issue #1862)

This change removes some apollo_private and OpenTelemetry (e.g., otel.kind) fields from the logs.

By @garypen and @bnjjj in #1868

Update and validate `Dockerfile` files (Issue #1854)

Several of the Dockerfiles in the repository were out-of-date with respect to recent configuration changes. We've updated the configuration files and extended our tests to catch this automatically in the future.

By @garypen in #1857

🛠 Maintenance

Disable Deno snapshotting when building inside `docs.rs`

This works around V8 linking errors and caters to specific build-environment constraints and requirements that exist on the Rust documentation site docs.rs.

By @SimonSapin in #1847

Add the Studio Uplink schema to the repository, with a test checking that it is up to date.

Previously we were downloading the Apollo Studio Uplink schema (which is used for fetching Managed Federation schema updates) at compile-time, which would fail in build environments without Internet access, like docs.rs' build system.

If an update is needed, the test failure will print a message with the command to run.

By @SimonSapin in #1847

Contributors

garypen, SimonSapin, and 4 other contributors

Assets 8

Releases: apollographql/router

v1.8.0

📃 Configuration

Defer support graduates from preview (Issue #2368)

Before:

After:

Remove timeout from OTLP exporter (Issue #2337)

🚀 Features

The Helm chart has graduated from prerelease to general availability (PR #2380)

Emit hit/miss metrics for APQ, Query Planning and Introspection caches (Issue #1985)

Add support for single instance Redis (Issue #2300)

Support TLS connections to single instance Redis (Issue #2332)

🐛 Fixes

Correctly handle aliased __typename fields (Issue #2330)

subgraph_request span is now set as the parent of traces coming from subgraphs (Issue #2344)

🛠 Maintenance

Simplify telemetry config code (Issue #2337)

Upgrade the clap version in scaffold templates (Issue #2165)

Upgrade axum to 0.6.1 (PR #2303)

Set the HTTP response content-type as application/json when returning GraphQL errors (Issue #2320)

Contributors

v1.7.0

🚀 Features

Newly scaffolded projects now include a Dockerfile (Issue #2295)

Apollo Uplink communication timeout is configurable (PR #2271)

Query plan cache is pre-warmed using existing operations when the supergraph changes (Issue #2302, Issue #2308)

🐛 Fixes

Propagate errors across inline fragments (PR #2304)

Only rebuild protos if reports.proto source changes

Return an error on duplicate keys in configuration (Issue #1428)

Return requested __typename in initial chunk of a deferred response (Issue #1922)

Log retriable Apollo Uplink failures at the debug level (Issue #2004)

Traces won't cause missing field-stats (Issue #2267)

Replace default notify watcher mechanism with PollWatcher (Issue #2245)

Preserve subgraph error's path property when redacting subgraph errors (Issue #1818)

Use correct URL decoding for variables in HTTP GET requests (Issue #2248)

🛠 Maintenance

Return additional details to client for invalid GraphQL requests (Issue #2301)

OpenTelemetry spans to subgraphs now include the request URL (Issue #2280)

Errors returned to clients are now more consistently formed (Issue #2101)

🧪 Experimental

Contributors

v1.6.0

🚀 Features

Add support for experimental tooling (Issue #2136)

Re-deploy router pods if the SuperGraph configmap changes (PR #2223)

Tracing batch span processor is now configurable (Issue #2232)

Add hot-reload support for Rhai scripts (Issue #1071)

Add support for working with multi-value header keys to Rhai (Issue #2211, Issue #2255)

🐛 Fixes

Filter nullified deferred responses (Issue #2213)

Return root __typename when parts of a query with deferred fragment (Issue #1677)

Wait for opentelemetry tracer provider to shutdown (PR #2191)

Dispatch errors from the primary response to deferred responses (Issue #1818, Issue #2185)

Reconstruct deferred queries with knowledge about fragments (Issue #2105)

🛠 Maintenance

Improve plugin registration predictability (PR #2181)

it_rate_limit_subgraph_requests fixed (Issue #2213)

Update reports.proto protobuf definition (PR #2247)

Upgrade OpenTelemetry to 0.18 (Issue #1970)

Remove spaceport (Issue #2233)

Update to Rust 1.65 (Issue #2220)

Improve automated release (Pull #2220)

Use Elastic-2.0 license spdx (PR #2055)

Configuration

Protoc now required to build from source (Issue #1970)

Jaeger scheduled_delay moved to batch_processor->scheduled_delay ([Issue #2232](https://github.com/apollographql/router/issu...

Contributors

v1.5.0

🚀 Features

Add configuration for trace ID (Issue #2080)

Add configuration for logging and add more logs (Issue #1998)

Provide multi-arch (amd64/arm64) Docker images for the Router (Issue #1932)

Add a supergraph configmap option to the helm chart (PR #2119)

Configuration upgrades (Issue #2123)

Experimental 🥼 subgraph request retry (Issue #338, Issue #1956)

Experimental 🥼 Caching configuration (Issue #2075)

@defer Apollo tracing support (Issue #1600)

🐛 Fixes

Router debug Docker images now run under the control of heaptrack (Issue #2135)

Remove `timeout` from OTLP exporter (Issue #2337)

Correctly handle aliased `__typename` fields (Issue #2330)

`subgraph_request` span is now set as the parent of traces coming from subgraphs (Issue #2344)

Upgrade the `clap` version in scaffold templates (Issue #2165)

Upgrade axum to `0.6.1` (PR #2303)

Set the HTTP response `content-type` as `application/json` when returning GraphQL errors (Issue #2320)

Newly scaffolded projects now include a `Dockerfile` (Issue #2295)

Only rebuild `protos` if `reports.proto` source changes

Return requested `__typename` in initial chunk of a deferred response (Issue #1922)

Log retriable Apollo Uplink failures at the `debug` level (Issue #2004)

Replace default `notify` watcher mechanism with `PollWatcher` (Issue #2245)

Preserve subgraph error's `path` property when redacting subgraph errors (Issue #1818)

Use correct URL decoding for `variables` in HTTP `GET` requests (Issue #2248)

Return root `__typename` when parts of a query with deferred fragment (Issue #1677)

`@defer` Apollo tracing support (Issue #1600)

handle mutations containing `@defer` (Issue #2099)

Add support for `urlencode()` / `decode()` in Rhai (Issue #2052)

Expose `query_plan` to `ExecutionRequest` in Rhai (PR #2081)

Move error messages about nullifying into `extensions` (Issue #2071)

Fix `Float` input-type coercion for default values with values larger than 32-bit (Issue #2087)

Assume `Accept: application/json` when no `Accept` header is present Issue #1990)

`@skip` and `@include` implementation for root-level fragment use (Issue #2072)

Use `debian:bullseye-slim` as our base Docker image (PR #2085)

Update `apollo-parser` to `v0.3.2` (PR #2103)

Fix example `helm show values` command (PR #2088)

Add `trace_id` in logs to correlate entries from the same request (Issue #1981)

Fix the rhai SDL `print` function (Issue #2005)

Export `router_factory::Endpoint` (PR #2007)

Address regression when sending gRPC to `localhost` (Issue #2036)

Apply traffic-shaping directly to `supergraph` and `subgraph` (PR #2034)

Remove references to Git submodules from `DEVELOPMENT.md` (Issue #2012)

Assume `Accept: application/json` when no `Accept` header is present (Issue #1995)

Prefix the Prometheus metrics with `apollo_router_` (Issue #1915)

Fix `--hot-reload` in Kubernetes and Docker (Issue #1476)

Prometheus: make sure `apollo_router_http_requests_error_total` and `apollo_router_http_requests_total` are incremented. (PR #1953)

Set `no_delay` and `keepalive` on subgraph requests Issue #1905)

Update `docker-compose` and `Dockerfile`s now that the submodules have been removed (PR #1950)

Fix logic around `Accept` headers and multipart responses (PR #1923)

`@defer`: duplicated errors across incremental items (Issue #1834, Issue #1818)

Remove support for `rhai.input_file` from the helm chart (Issue #1826)