Releases: apollographql/router
v1.19.0-alpha.0
1.19.0-alpha.0
v1.18.1
🐛 Fixes
Fix multipart response compression by using a large enough buffer
When writing a deferred response, if the output buffer was too small to write the entire compressed response, the compressor would write a small chunk that did not decompress to the entire primary response, and would then wait for the next response to send the rest.
Unfortunately, we cannot really know the output size we need in advance, and if we asked the decoder, it will tell us that it flushed all the data, even if it could have sent more. To compensate for this, we raise the output buffer size, and do a second buffer growing step after flushing, if necessary.
Emit more log details to the state machine's Running
phase (Issue #3065)
This change adds details about the triggers of potential state changes to the logs and also makes it easier to see when an un-entitled event causes a state change to be ignored.
Prior to this change, it was difficult to know from the logs why a router state reload had been triggered and the logs didn't make it clear that it was possible that the state change was going to be ignored.
Respect GraphOS/Studio metric "backoff" guidance (Issue #2888)
For stability reasons, GraphOS metric ingress will return an HTTP 429
status code with Retry-After
guidance if it's unable to immediately accept a metric submission from a router. A router instance should not try to submit further metrics until that amount of time (in seconds) has elapsed. This fix provides support for this interaction.
While observing a backoff request from GraphOS, the router will continue to collect metrics and no metrics are lost unless the router terminates before the timeout expires.
🛠 Maintenance
Refactor the way we're redacting errors for Apollo telemetry
This follows-up on the federated subgraph trace error redaction mechanism changes which first appeared in v1.16.0 via PR #3011 with some internal refactoring that improves the readability of the logic. There should be no functional changes to the feature's behavior.
v1.18.0
🚀 Features
Introduced new metric which tracks query planning time
We've introduced a apollo_router_query_planning_time
histogram which captures time spent in the query planning phase. This is documented along with our other metrics in the documentation.
🐛 Fixes
Small gzip'd responses no longer cause a panic
A regression introduced in v1.17.0 — again related to compression — has been resolved. This occurred when small responses used invalid buffer management, causing a panic.
HTTP status codes are now returned in SubrequestHttpError
as intended
When contextually available, the HTTP status code is included within SubrequestHttpError
. This provides plugins the ability to access the status code directly. Previously, only string parsing of the reason
could be used to determine the status code.
This corrects a previous contribution which added the status code, but neglected to serialize it properly into the extensions
in the response which are made available to plugins. Thank you to the same contributor for the correction!
By @scottdouglas1989 in #3005
📚 Documentation
Indicate that apollo_router_cache_size
is a count of cache entries
This follows-up PR #2607 which added apollo_router_cache_size
. It adds apollo_router_cache_size
to the documentation and indicates that this is the number of cache entries (that is, a count).
v1.17.0
🚀 Features
GraphOS Enterprise: Operation Limits
You can define operation limits in your router's configuration to reject potentially malicious requests. An operation that exceeds any specified limit is rejected.
You define operation limits in your router's YAML config file, like so:
preview_operation_limits:
max_depth: 100
max_height: 200
max_aliases: 30
max_root_fields: 20
See details in operation limits documentation for information on setting up this GraphOS Enterprise feature.
By @SimonSapin, @lrlna, and @StephenBarlow
🐛 Fixes
Ensure the compression state is flushed (Issue #3035)
In some cases, the "finish" call to flush the compression state at the end of a request did was not flushing the entire state and it has to be called multiple times.
This fixes a regression introduced in v1.16.0 by #2986 which resulted in larger responses being truncated after compression.
🛠 Maintenance
Make test_experimental_notice
assertion more targeted (Pull #3036)
Previously this test relied on a full snapshot of the log message. This was likely to result in failures, either due to environmental reasons or other unrelated changes.
The test now relies on a more targeted assertion that is less likely to fail under various conditions.
By @BrynCooke in #3036
v1.16.0
🚀 Features
Add ability to transmit un-redacted errors from federated traces to Apollo Studio
When using subgraphs which are enabled with Apollo Federated Tracing, the error messages within those traces will be redacted by default.
New configuration (tracing.apollo.errors.subgraph.all.redact
, which defaults to true
) enables or disables the redaction mechanism. Similar configuration (tracing.apollo.errors.subgraph.all.send
, which also defaults to true
) enables or disables the entire transmission of the error to Studio.
The error messages returned to the clients are not changed or redacted from their previous behavior.
To enable sending subgraph's federated trace error messages to Studio without redaction, you can set the following configuration:
telemetry:
apollo:
errors:
subgraph:
all:
send: true # (true = Send to Studio, false = Do not send; default: true)
redact: false # (true = Redact full error message, false = Do not redact; default: true)
It is also possible to configure this per-subgraph using a subgraphs
map at the same level as all
in the configuration, much like other sections of the configuration which have subgraph-specific capabilities:
telemetry:
apollo:
errors:
subgraph:
all:
send: true
redact: false # Disable redaction as a default. The `accounts` service enables it below.
subgraphs:
accounts: # Applies to the `accounts` subgraph, overriding the `all` global setting.
redact: true # Redact messages from the `accounts` service.
Introduce response.is_primary
Rhai helper for working with deferred responses (Issue #2935) (Issue #2936)
A new Rhai response.is_primary()
helper has been introduced that returns false
when the current chunk being processed is a deferred response chunk. Put another way, it will be false
if the chunk is a follow-up response to the initial primary response, during the fulfillment of a @defer
'd fragment in a larger operation. The initial response will be is_primary() == true
. This aims to provide the right primitives so users can write more defensible error checking. The introduction of this relates to a bug fix noted in the Fixes section below.
Time-based forced hot-reload for "chaos" testing
For testing purposes, the Router can now artificially be forced to hot-reload (as if the configuration or schema had changed) at a configured time interval. This can help reproduce issues like reload-related memory leaks. We don't recommend using this in any production environment. (If you are compelled to use it in production, please let us know about your use case!)
The new configuration section for this "chaos" testing is (and will likely remain) marked as "experimental":
experimental_chaos:
force_hot_reload: 1m
By @SimonSapin in #2988
Provide helpful console output when using "preview" features, just like "experimental" features
This expands on the existing mechanism that was originally introduced in #2242, which supports the notion of an "experimental" feature, and make it compatible with the notion of "preview" features.
When preview or experimental features are used, an INFO
-level log is emitted during startup to notify of which features are used and shows URLs to their GitHub discussions, for feedback. Additionally, router config experimental
and router config preview
CLI sub-commands list all such features in the current Router version, regardless of which are used in a given configuration file.
For more information about launch stages, please see the documentation here: https://www.apollographql.com/docs/resources/product-launch-stages/
By @o0Ignition0o, @abernix, and @SimonSapin in #2960
Report operationCountByType
counts to Apollo Studio (PR #2979)
This adds the ability for Studio to track operation counts broken down by type of operations (e.g., query
vs mutation
). Previously, we only reported total operation count.
🐛 Fixes
Update to Federation v2.4.2
This update to Federation v2.4.2 fixes a potential bug when an @interfaceObject
type has a @requires
. This might be encountered when an @interfaceObject
type has a field with a @requires
and the query requests that field only for some specific implementations of the corresponding interface. In this case, the generated query plan was sometimes invalid and could result in an invalid query to a subgraph. In the case that the subgraph was an Apollo Server implementation, this lead the subgraph producing an "The _entities resolver tried to load an entity for type X, but no object or interface type of that name was found in the schema"
error.
Fix handling of deferred response errors from Rhai scripts (Issue #2935) (Issue #2936)
If a Rhai script was to error while processing a deferred response (i.e., an operation which uses @defer
) the Router was ignoring the error and returning None
in the stream of results. This had two unfortunate aspects:
- the error was not propagated to the client
- the stream was terminated (silently)
With this fix we now capture the error and still propagate the response to the client. This fix also adds support for the is_primary()
method which may be invoked on both supergraph_service()
and execution_service()
responses. It may be used to avoid implementing exception handling for header interactions and to determine if a response is_primary()
(i.e., first) or not.
e.g.:
if response.is_primary() {
print(`all response headers: `);
} else {
print(`don't try to access headers`);
}
vs
try {
print(`all response headers: `);
}
catch(err) {
if err == "cannot access headers on a deferred response" {
print(`don't try to access headers`);
}
}
Note
This is a minimal example for purposes of illustration which doesn't exhaustively check all error conditions. An exception handler should always handle all error conditions.
Fix incorrectly placed "message" in Rhai JSON-formatted logging (Issue #2777)
This fixes a bug where Rhai logging was incorrectly putting the message of the log into the out
attribute, when serialized as JSON. Previously, the message
field was showing rhai_{{level}}
(i.e., rhai_info
), despite there being a separate level
field in the JSON structure.
The impact of this fix can be seen in this example where we call log_info()
in a Rhai script:
log_info("this is info");
Previously, this would result in a log as follows, with the text of the message set within out
, rather than message
.
{"timestamp":"2023-04-19T07:46:15.483358Z","level":"INFO","message":"rhai_info","out":"this is info"}
After the change, the message is correctly within message
. The level continues to be available at level
. We've also additionally added a target
property which shows the file which produced the error:
{"timestamp":"2023-04-19T07:46:15.483358Z","level":"INFO","message":"this is info","target":"src/rhai_logging.rhai"}
Deferred responses now utilize compression, when requested (Issue #1572)
We previously had to disable compression on deferred responses due to an upstream library bug. To fix this, we've replaced tower-http
's CompressionLayer
with a custom stream transformation. This is necessary because tower-http
uses async-compression
under the hood, which buffers data until the end of the stream, analyzes it, then writes it, ensuring a better compression. However, this is wholly-incompatible with a core concept of the multipart protocol for @defer
, which requires chunks to be sent as soon as possible. To support that, we need to compress chunks independently.
This extracts parts of the codec
module of async-compression
, which so far is not public, and makes a streaming wrapper above it that flushes the compressed data on every response within the stream.
Update the h2
dependency to fix a potential Denial-of-Service (DoS) vulnerability
Proactively addresses the advisory in https://rustsec.org/advisories/RUSTSEC-2023-0034, though we have no evidence that suggests it has been exploited on any Router deployment.
Rate limit errors emitted from OpenTelemetry (Issue #2953)
When a batch span exporter is unable to send accept a span because...
v1.16.0-alpha.0
1.16.0-alpha.0
v1.15.1
🐛 Fixes
Resolve Docker unrecognized subcommand
error (Issue #2966)
We've repaired the Docker build of the v1.15.0 release which broke due to the introduction of syntax in the Dockerfile which can only be used by the the docker buildx
tooling which leverages Moby BuildKit.
Furthermore, the change didn't apply to the diy
("do-it-yourself") image, and we'd like to prevent the two Dockerfiles from deviating more than necessary.
Overall, this reverts apollographql/router#2925.
Helm Chart extraContainers
This is another iteration on the functionality for supporting side-cars within Helm charts, which is quite useful for coprocessor configurations.
📃 Configuration
Treat Helm extraLabels
as templates
It is now possible to use data from Helm's Values
or Chart
objects to add additional labels to Kubernetes Deployments of Pods.
As of this release, the following example:
extraLabels:
env: {{ .Chart.AppVersion }}
... will now result in:
labels:
env: "v1.2.3"
Previously, this would have resulted in merely emitting the untemplatized {{ .Chart.AppVersion }}
value, resulting in an invalid label.
By @gscheibel in #2962
v1.15.0
🚀 Features
GraphOS Enterprise: Allow JWT algorithm restrictions (Issue #2714)
It is now possible to restrict the list of accepted algorthms to a well-known set for cases where an issuer's JSON Web Key Set (JWKS) contains keys which are usable with multiple algorithms.
🐛 Fixes
Invalid requests now return proper GraphQL-shaped errors (Issue #2934), (Issue #2946)
Unsupported content-type
and accept
headers sent on requests now return proper GraphQL errors nested as elements in a top-level errors
array, rather than returning a single GraphQL error JSON object.
This also introduces a new error code, INVALID_CONTENT_TYPE_HEADER
, rather than using INVALID_ACCEPT_HEADER
when an invalid content-type
header was received.
By @EverlastingBugstopper in #2947
🛠 Maintenance
Remove redundant println!()
that broke json formatted logging (PR #2923)
The println!()
statement being used in our trace transmission logic was redundant since it was already covered by a pre-existing WARN
log line. Most disruptively though, it broke JSON logging.
For example, this previously showed as:
Got error sending request for url (https://example.com/api/ingress/traces): connection error: unexpected end of file
{"timestamp":"2023-04-11T06:36:27.986412Z","level":"WARN","message":"attempt: 1, could not transfer: error sending request for url (https://example.com/api/ingress/traces): connection error: unexpected end of file"}
It will now merely log the second line.
Adds HTTP status code to subgraph HTTP error type
When contextually available, the SubrequestHttpError
now includes the HTTP status code. This provides plugins with the ability to access the status code directly. Previously, parsing the reason
value as a string was the only way to determine the status code.
By @scottdouglas1989 in #2902
Pin the router-bridge
version
When using the router as a library, router-bridge
versions can be automatically updated, which can result in incompatibilities. We want to ensure that the Router and router-bridge
always work with vetted versions, so we now pin it in our Cargo.toml
and update it using our tooling.
Update to Federation v2.4.1 (2937)
The Router has been updated to use Federation v2.4.1, which includes a fix involving @interfaceObject
.
By @o0Ignition0o in #2957
v1.14.0
🚀 Features
GraphOS Enterprise: Coprocessor read access to request uri
, method
and HTTP response status codes (Issue #2861, Issue #2861)
We've added the ability for coprocessors to have read-only access to additional contextual information at the RouterService
and SubgraphService
stages:
The RouterService
stage now has read-only access to these client request properties:
path
(e.g.,/graphql
)method
(e.g.,POST
,GET
)
The RouterService
stage now has read-only access to these client response properties:
status_code
(e.g.403
,200
)
The SubgraphService
stage now has read-only access to these subgraph response properties:
status_code
(e.g.,503
,200
)
By @o0Ignition0o in #2863
🐛 Fixes
Coprocessors: Empty body requests from GET
requests are now deserialized without error
Fixes a bug where a coprocessor operating at the router_request
stage would fail to deserialize an empty body, which is typical for GET
requests.
By @o0Ignition0o in #2863
📃 Configuration
Helm: Router chart now supports extraLabels
for Deployments/Pods
Our Helm chart now supports a new value called extraLabels
, which enables chart users to add custom labels to the Router Deployment and its Pods.
By @gscheibel in #2903
Helm: Router chart now supports extraContainers
to run sidecars
Our Helm chart now supports extraContainers
in an effort to simplify the ability to run containers alongside Router containers (sidecars) which is a useful pattern for coprocessors.
Migrate away from unimplemented coprocessor.subgraph.all.response.uri
We have removed a completely unimplemented coprocessor.subgraph.all.response.uri
key from our configuration. It had no effect, but we will automatically migrate configurations which did use it, resulting in no breaking changes by this removal.
By @o0Ignition0o in #2863
📚 Documentation
Update coprocessor documentation to reflect newly added fields (Issue #2886)
The External coprocessing documentation is now up to date, with a full configuration example, and the newly added fields.
By @o0Ignition0o in #2863
Example: Rhai-based cache-control
response header management
A new Rhai example demonstrates how to recreate some of the behavior of Apollo Gateway's subgraph cache-control
response header behavior. This addresses some of the need identified in #326.
By @lennyburdette in #2759
v1.13.2
🐛 Fixes
Replace the old query planner with the incoming query planner on reload
We've fixed an important regression in v1.13.1 (introduced by PR #2706) which resulted in Routers failing to update to newer supergraphs unless they were fully restarted; hot-reloads of the supergraph did not work properly. This affects all v1.13.1 versions, whether the supergraph was delivered from a local file or if delivered as part of Managed Federation through Apollo Uplink.