Releases: apollographql/router
v1.23.0
🚀 Features
Add --listen
to CLI args (PR #3296)
Adds --listen
to CLI args, which allows the user to specify the address to listen on.
It can also be set via environment variable APOLLO_ROUTER_LISTEN_ADDRESS
.
router --listen 0.0.0.0:4001
By @ptondereau and @BrynCooke in #3296
Move operation limits and parser limits to General Availability (PR #3356)
Operation Limits (a GraphOS Enterprise feature) and parser limits are now moving to General Availability, from Preview where they have been since Apollo Router 1.17.
For more information about launch stages, please see the documentation here: https://www.apollographql.com/docs/resources/product-launch-stages/
In addition to removing the preview_
prefix, the configuration section has been renamed to just limits
to encapsulate operation, parser and request limits. (The request size limit is still experimental.) Existing configuration files will keep working as before, but with a warning output to the logs. To fix that warning, rename the configuration section like so:
-preview_operation_limits:
+limits:
max_depth: 100
max_height: 200
max_aliases: 30
max_root_fields: 20
By @SimonSapin in #3356
Add support for readiness/liveness checks (Issue #3233)
Kubernetes lifecycle interop has been improved by implementing liveliness and readiness checks.
Kubernetes considers a service is:
- live - if it isn't deadlocked
- ready - if it is able to start accepting traffic
(For more details: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
The existing health check didn't surface this information. Instead, it returns a payload which indicates if the router is "healthy" or not and it's always returning "UP" (hard-coded).
The router health check now exposes this information based in the following rules:
- Live
- Is not in state Errored
- Health check enabled and responding
- Ready
- Is running and accepting requests.
- Is
Live
To maintain backwards compatibility; query parameters named "ready" and "live" have been added to our existing health endpoint. Both POST and GET are supported.
Sample queries:
curl -XPOST "http://localhost:8088/health?ready" OR curl "http://localhost:8088/health?ready"
curl -XPOST "http://localhost:8088/health?live" OR curl "http://localhost:8088/health?live"
Include path to Rhai script in syntax error messages
Syntax errors in the main Rhai script will now include the path to the script in the error message.
Experimental support for GraphQL validation in Rust
We are experimenting with a new GraphQL validation implementation written in Rust. The legacy implementation is part of the JavaScript query planner. This is part of a project to remove JavaScript from the Router to improve performance and memory behavior.
To opt in to the new validation implementation, set:
experimental_graphql_validation_mode: new
Or use both
to run the implementations side by side and log a warning if there is a difference in results:
experimental_graphql_validation_mode: both
This is an experimental option while we are still finding edge cases in the new implementation, and will be removed once we have confidence that parity has been achieved.
By @goto-bus-stop in #3134
Add environment variable access to rhai (Issue #1744)
This introduces support for accessing environment variable within Rhai. The new env
module contains one function and is imported by default:
By @garypen in #3240
Add support for getting request method in Rhai (Issue #2467)
This adds support for getting the HTTP method of requests in Rhai.
fn process_request(request) {
if request.method == "OPTIONS" {
request.headers["x-custom-header"] = "value"
}
}
Add additional build functionality to the diy build script (Issue #3303)
The diy build script is useful for ad-hoc image creation during testing or for building your own images based on a router repo. This set of enhancements makes it possible to
- build docker images from arbitrary (nightly) builds (-a)
- build an amd64 docker image on an arm64 machine (or vice versa) (-m)
- change the name of the image from the default 'router' (-n)
Note: the build machine image architecture is used if the -m flag is not supplied.
🐛 Fixes
Bring root span name in line with otel semantic conventions. (Issue #3229)
Root span name has changed from request
to <graphql.operation.kind> <graphql.operation.name>
Open Telemetry graphql semantic conventions specify that the root span name must match the operation kind and name.
Many tracing providers don't have good support for filtering traces via attribute, so changing this significantly enhances the tracing experience.
By @BrynCooke in #3364
An APQ query with a mismatched hash will error as HTTP 400 (Issue #2948)
We now have the same behavior in the Gateway and the Router implementation. Even if our previous behavior was still acceptable, any other behavior is a misconfiguration of a client and should be prevented early.
Previously, if a client sent an operation with an APQ hash, we would merely log an error to the console, not register the operation (for the next request) but still execute the query. We now return a GraphQL error and don't execute the query. No clients should be impacted by this, though anyone who had hand-crafted a query with APQ information (for example, copied a previous APQ-registration query but only changed the operation without re-calculating the SHA-256) might now be forced to use the correct hash (or more practically, remove the hash).
By @o0Ignition0o in #3128
fix(subscription): take the callback url path from the configuration (Issue #3361)
Previously when you specified the subscription.mode.callback.path
it was not used, we had an hardcoded value set to /callback
. It's now using the specified path in the configuration
Preserve all shutdown receivers across reloads (Issue #3139)
We keep a list of all active requests and process all of them during shutdown. This will avoid prematurely terminating connections down when:
- some requests are in flight
- the router reloads (new schema, etc)
- the router gets a shutdown signal
Enable serde_json float_roundtrip feature (Issue #2951)
The Router now preserves JSON floating point numbers exactly as they are received by enabling the serde_json
float_roudtrip
feature:
Use sufficient precision when parsing fixed precision floats from JSON to ensure that they maintain accuracy when round-tripped through JSON. This comes at an approximately 2x performance cost for parsing floats compared to the default best-effort precision.
Fix deferred response formatting when filtering queries (PR #3298, Issue #3263, PR #3339)
Filtering queries requires two levels of response formatting, and its implementation highlighted issues with deferred responses. Response formatting needs to recognize which deferred fragment generated it, and that the deferred response shapes can change depending on request variables, due to the @defer
directive's if
argument.
For now, this is solved by generating the response shapes for primary and deferred responses, for each combination of the variables used in @defer
applications, limited to 32 unique variables. There will be follow up work with another approach that removes this limitation.
By @Geal and @SimonSapin in #3298, #3263 and #3339
...
v1.23.0-alpha.0
1.23.0-alpha.0
v1.22.0
🚀 Features
Federated Subscriptions (PR #3285)
⚠️ This is an Enterprise feature of the Apollo Router. It requires an organization with a GraphOS Enterprise plan.If your organization doesn't currently have an Enterprise plan, you can test out this functionality by signing up for a free Enterprise trial.
High-Level Overview
What are Federated Subscriptions?
This PR adds GraphQL subscription support to the Router for use with Federation. Clients can now use GraphQL subscriptions with the Router to receive realtime updates from a supergraph. With these changes, subscription
operations are now a first-class supported feature of the Router and Federation, alongside queries and mutations.
Client to Router Communication
- Apollo has designed and implemented a new open protocol for handling subscriptions called multipart subscriptions
- With this new protocol clients can manage subscriptions with the Router over tried and true HTTP; WebSockets, SSE (server-sent events), etc. are not needed
- All Apollo clients (Apollo Client web, Apollo Kotlin, Apollo iOS) have been updated to support multipart subscriptions, and can be used out of the box with little to no extra configuration
- Subscription communication between clients and the Router must use the multipart subscription protocol, meaning only subscriptions over HTTP are supported at this time
Router to Subgraph Communication
- The Router communicates with subscription enabled subgraphs using WebSockets
- By default, the router sends subscription requests to subgraphs using the graphql-transport-ws protocol which is implemented in the graphql-ws library. You can also configure it to use the graphql-ws protocol which is implemented in the subscriptions-transport-ws library.
- Subscription ready subgraphs can be introduced to Federation and the Router as is - no additional configuration is needed on the subgraph side
Subscription Execution
When the Router receives a GraphQL subscription request, the generated query plan will contain an initial subscription request to the subgraph that contributed the requested subscription root field.
For example, as a result of a client sending this subscription request to the Router:
subscription {
reviewAdded {
id
body
product {
id
name
createdBy {
name
}
}
}
}
The router will send this request to the reviews
subgraph:
subscription {
reviewAdded {
id
body
product {
id
}
}
}
When the reviews
subgraph receives new data from its underlying source event stream, that data is sent back to the Router. Once received, the Router continues following the determined query plan to fetch any additional required data from other subgraphs:
Example query sent to the products
subgraph:
query ($representations: [_Any!]!) {
_entities(representations: $representations) {
... on Product {
name
createdBy {
__typename
email
}
}
}
}
Example query sent to the users
subgraph:
query ($representations: [_Any!]!) {
_entities(representations: $representations) {
... on User {
name
}
}
}
When the Router finishes running the entire query plan, the data is merged back together and returned to the requesting client over HTTP (using the multipart subscriptions protocol).
Configuration
Here is a configuration example:
subscription:
mode:
passthrough:
all: # The router uses these subscription settings UNLESS overridden per-subgraph
path: /subscriptions # The path to use for subgraph subscription endpoints (Default: /ws)
subgraphs: # Overrides subscription settings for individual subgraphs
reviews: # Overrides settings for the 'reviews' subgraph
path: /ws # Overrides '/subscriptions' defined above
protocol: graphql_transport_ws # The WebSocket-based protocol to use for subscription communication (Default: graphql_ws)
Usage Reporting
Subscription use is tracked in the Router as follows:
- Subscription registration: The initial subscription operation sent by a client to the Router that's responsible for starting a new subscription
- Subscription notification: The resolution of the client subscription’s selection set in response to a subscription enabled subgraph source event
Subscription registration and notification (with operation traces and statistics) are sent to Apollo Studio for observability.
Advanced Features
This PR includes the following configurable performance optimizations.
Deduplication
- If the Router detects that a client is using the same subscription as another client (ie. a subscription with the same HTTP headers and selection set), it will avoid starting a new subscription with the requested subgraph. The Router will reuse the same open subscription instead, and will send the same source events to the new client.
- This helps reduce the number of WebSockets that need to be opened between the Router and subscription enabled subgraphs, thereby drastically reducing Router to subgraph network traffic and overall latency
- For example, if 100 clients are subscribed to the same subscription there will be 100 open HTTP connections from the clients to the Router, but only 1 open WebSocket connection from the Router to the subgraph
- Subscription deduplication between the Router and subgraphs is enabled by default (but can be disabled via the Router config file)
Callback Mode
- Instead of sending subscription data between a Router and subgraph over an open WebSocket, the Router can be configured to send the subgraph a callback URL that will then be used to receive all source stream events
- Subscription enabled subgraphs send source stream events (subscription updates) back to the callback URL by making HTTP POST requests
- Refer to the callback mode documentation for more details, including an explanation of the callback URL request/response payload format
- This feature is still experimental and needs to be enabled explicitly in the Router config file
By @bnjjj and @o0Ignition0o in #3285
v1.22.0-alpha.0
1.22.0-alpha.0
v1.21.0
🚀 Features
Restore HTTP payload size limit, make it configurable (Issue #2000)
Early versions of Apollo Router used to rely on a part of the Axum web framework
that imposed a 2 MB limit on the size of the HTTP request body.
Version 1.7 changed to read the body directly, unintentionally removing this limit.
The limit is now restored to help protect against unbounded memory usage, but is now configurable:
preview_operation_limits:
experimental_http_max_request_bytes: 2000000 # Default value: 2 MB
This limit is checked while reading from the network, before JSON parsing.
Both the GraphQL document and associated variables count toward it.
Before increasing this limit significantly consider testing performance
in an environment similar to your production, especially if some clients are untrusted.
Many concurrent large requests could cause the Router to run out of memory.
By @SimonSapin in #3130
Add support for empty auth prefixes (Issue #2909)
The authentication.jwt
plugin now supports empty prefixes for the JWT header. Some companies use prefix-less headers; previously, the authentication plugin rejected requests even with an empty header explicitly set, such as:
authentication:
jwt:
header_value_prefix: ""
🐛 Fixes
GraphQL introspection errors are now 400 errors (Issue #3090)
If we get an introspection error during SupergraphService::plan_query(), then it is reported to the client as an HTTP 500 error. The Router now generates a valid GraphQL error for introspection errors whilst also modifying the HTTP status to be 400.
Before:
StatusCode:500
{"errors":[{"message":"value retrieval failed: introspection error: introspection error : Field \"__schema\" of type \"__Schema!\" must have a selection of subfields. Did you mean \"__schema { ... }\"?","extensions":{"code":"INTERNAL_SERVER_ERROR"}}]}
After:
StatusCode:400
{"errors":[{"message":"introspection error : Field \"__schema\" of type \"__Schema!\" must have a selection of subfields. Did you mean \"__schema { ... }\"?","extensions":{"code":"INTROSPECTION_ERROR"}}]}
Restore missing debug tools in "debug" Docker images (Issue #3249)
Debug Docker images were designed to make use of heaptrack
for debugging memory issues. However, this functionality was inadvertently removed when we changed to multi-architecture Docker image builds.
heaptrack
functionality is now restored to our debug docker images.
Federation v2.4.8 (Issue #3217, Issue #3227)
This release bumps the Router's Federation support from v2.4.7 to v2.4.8, which brings in notable query planner fixes from v2.4.8. Of note from those releases, this brings query planner fixes that (per that dependency's changelog):
-
Fix bug in the handling of dependencies of subgraph fetches. This bug was manifesting itself as an assertion error (apollographql/federation#2622)
thrown during query planning with a message of the formRoot groups X should have no remaining groups unhandled (...)
. -
Fix issues in code to reuse named fragments. One of the fixed issue would manifest as an assertion error with a message (apollographql/federation#2619)
looking likeCannot add fragment of condition X (...) to parent type Y (...)
. Another would manifest itself by
generating an invalid subgraph fetch where a field conflicts with another version of that field that is in a reused
named fragment.
These manifested as Router issues #3217 and #3227.
By @renovate and o0ignition0o in #3202
update Rhai to 1.15.0 to fix issue with hanging example test (Issue #3213)
One of our Rhai examples' tests have been regularly hanging in the CI builds. Investigation uncovered a race condition within Rhai itself. This update brings in the fixed version of Rhai and should eliminate the hanging problem and improve build stability.
🛠 Maintenance
chore: split out router events into its own module (PR #3235)
Breaks down ./apollo-router/src/router.rs
into its own module ./apollo-router/src/router/mod.rs
with a sub-module ./apollo-router/src/router/event/mod.rs
that contains all the streams that we combine to start a router (entitlement, schema, reload, configuration, shutdown, more streams to be added).
By @EverlastingBugstopper in #3235
Simplify router service tests (PR #3259)
Parts of the router service creation were generic, to allow mocking, but the TestHarness
API allows us to reuse the same code in all cases. Generic types have been removed to simplify the API.
📚 Documentation
Improve example Rhai scripts for JWT Authentication (PR #3184)
Simplify the example Rhai scripts in the JWT Authentication docs and includes a sample main.rhai
file to make it clear how to use all scripts together.
🧪 Experimental
Expose the apollo compiler at the supergraph service level (internal) (PR #3200)
Add a query analysis phase inside the router service, before sending the query through the supergraph plugins. It makes a compiler available to supergraph plugins, to perform deeper analysis of the query. That compiler is then used in the query planner to create the Query
object containing selections for response formatting.
This is for internal use only for now, and the APIs are not considered stable.
By @o0Ignition0o and @Geal in #3200
Query planner plugins (internal) (Issue #3150)
Future functionality may need to modify a query between query plan caching and the query planner. This leads to the requirement to provide a query planner plugin capability.
Query planner plugin functionality exposes an ApolloCompiler instance to perform preprocessing of a query before sending it to the query planner.
This is for internal use only for now, and the APIs are not considered stable.
v1.21.0-alpha.1
1.21.0-alpha.1
v1.21.0-alpha.0
1.21.0-alpha.0
v1.20.0
🚀 Features
Configurable histogram buckets for metrics (Issue #2333)
It is now possible to change the default bucketing for histograms generated for metrics:
telemetry:
metrics:
common:
buckets:
- 0.05
- 0.10
- 0.25
- 0.50
- 1.00
- 2.50
- 5.00
- 10.00
- 20.00
🐛 Fixes
Federation v2.4.7 (Issue #3170, Issue #3133)
This release bumps the Router's Federation support from v2.4.6 to v2.4.7, which brings in notable query planner fixes from v2.4.7. Of note from those releases, this brings query planner fixes that (per that dependency's changelog):
- Re-work the code use to try to reuse query named fragments to improve performance (thus sometimes improving query (#2604) planning performance)
- Fix a raised assertion error (again, with a message of form like
Cannot add selection of field X to selection set of parent type Y
). - Fix a rare issue where an
interface
orunion
field was not being queried for all the types it should be.
Set the global allocator in the library crate, not just the executable (Issue #3126)
In 1.19, Apollo Router switched to use jemalloc
as the global Rust allocator on Linux to reduce memory fragmentation. However, prior to this change this was only occurring in the executable binary provided by the apollo-router
crate and custom binaries using the crate as a library were not getting this benefit.
The apollo-router
library crate now sets the global allocator so that custom binaries also take advantage of this by default. If some other choice is desired, the global-allocator
Cargo feature flag can be disabled in Cargo.toml
with:
[dependencies]
apollo-router = {version = "[…]", default-features = false}
Library crates that depend on apollo-router
(if any) should also do this in order to leave the choice to the eventual executable. (Cargo default features are only disabled if all dependents specify default-features = false
.)
By @SimonSapin in #3157
Add ca-certificates
to our Docker image (Issue #3173)
We removed curl
from our Docker images to improve security, which meant that our implicit install of ca-certificates
(as a dependency of curl
) was no longer performed.
This fix reinstates the ca-certificates
package explicitly, which is required for the router to be able to process TLS requests.
Helm: Running of helm test
no longer fails
Running helm test
was generating an error since wget
was sending a request without a proper body and expecting an HTTP status response of 2xx. Without the proper body, it expectedly resulted in an HTTP status of 400. By switching to using netcat
(or nc
) we will now check that the port is up and use that to determine that the router is functional.
By @bbardawilwiser in #3096
Move curl
dependency to separate layer in Docker image (Issue #3144)
We've moved curl
out of the Docker image we publish. The curl
command is only used in the image we produce today for the sake of downloading dependencies. It is never used after that, but we can move it to a separate layer to further remove it from the image.
🛠 Maintenance
Improve cargo-about
license checking (Issue #3176)
From the description of this cargo about PR, it is possible for NOASSERTION
identifiers to be added when gathering license information, causing license checks to fail. This change uses the new cargo-about
configuration filter-noassertion
to eliminate the problem.
v1.19.1
🐛 Fixes
Fix router coprocessor deferred response buffering and change JSON body type from Object to String (Issue #3015)
The current implementation of the RouterResponse
processing for coprocessors forces buffering of response data before passing the data to a coprocessor. This is a bug, because deferred responses should be processed progressively with a stream of calls to the coprocessor as each chunk of data becomes available.
Furthermore, the data type was assumed to be valid JSON for both RouterRequest
and RouterResponse
coprocessor processing. This is also a bug, because data at this stage of processing was never necessarily valid JSON. This is a particular issue when dealing with deferred (when using @defer
) RouterResponses
.
This change fixes both of these bugs by modifying the router so that coprocessors are invoked with a body
payload which is a JSON String
, not a JSON Object
. Furthermore, the router now processes each chunk of response data separately so that a coprocessor will receive multiple calls (once for each chunk) for a deferred response.
For more details about how this works see the coprocessor documentation.
Experimental: Query plan cache keys now include a hash of the query and operation name (Issue #2998)
Note
This feature is still experimental and not recommended under normal use nor is it validated that caching query plans in a distributed fashion will result in improved performance.
The experimental feature for caching query plans in a distributed store (e.g., Redis) will now create a SHA-256 hash of the query and operation name and include that hash in the cache key, rather than using the operation document as it was previously.
Federation v2.4.6 (Issue #3133)
This release bumps the Router's Federation support from v2.4.5 to v2.4.6, which brings in notable query planner fixes from v2.4.6. Of note from those releases, this brings query planner fixes that (per that dependency's changelog):
-
Fix assertion error in some overlapping fragment cases. In some cases, when fragments overlaps on some sub-selections (apollographql/federation#2594) and some interface field implementation relied on sub-typing, an assertion error could be raised with a message of the form
Cannot add selection of field X to selection set of parent type Y
and this fixes this problem. -
Fix possible fragment-related assertion error during query planning. This prevents a rare case where an assertion with a (apollographql/federation#2596) message of the form
Cannot add fragment of condition X (runtimes: ...) to parent type Y (runtimes: ...)
could fail during query planning.
In addition, the packaging includes dependency updates for bytes
, regex
, once_cell
, tokio
, and uuid
.
Error redaction for subgraphs now respects disabling it
This follows-up on the new ability to selectively disable Studio-bound error redaction which was released in #3011 by fixing a bug which was preventing users from disabling that behavior on subgraphs. Redaction continues to be on by default and both the default behavior and the explicit redact: true
option were behaving correctly.
With this fix, the tracing.apollo.errors.subgraph.all.redact
option set to false
will now transmit the un-redacted error message to Studio.
Evaluate multiple keys matching a JWT criteria (Issue #3017)
In some cases, multiple keys could match what a JWT asks for (both the algorithm, alg
, and optional key identifier, kid
). Previously, we scored each possible match and only took the one with the highest score. But even then, we could have multiple keys with the same score (e.g., colliding kid
between multiple JWKS in tests).
The improved behavior will:
- Return a list of those matching
key
instead of the one with the highest score. - Try them one by one until the JWT is validated, or return an error.
- If some keys were found with the highest possible score (matching
alg
, withkid
present and matching, too), then we only test those keys.
🛠 Maintenance
chore(deps): xtask/
dependency updates (PR #3149)
This is effectively running cargo update
in the xtask/
directory (our directory of tooling; not runtime components) to bring things more up to date.
This changeset takes extra care to update chrono
's features to remove the time
dependency which is impacted by CVE-2020-26235, resolving a moderate severity which was appearing in scans. Again, this is not a runtime dependency and there was no actual/known impact to any users.
Improve testability of the state_machine
in integration tests
We have introduced a TestRouterHttpServer
for writing more fine-grained integration tests in the Router core for the behaviors of the state machine.
By @o0Ignition0o in #3099
v1.19.0
Note
This release focused a notable amount of effort on improving both CPU usage and memory utilization/fragmentization. Our testing and pre-release feedback has been overwhelmingly positive. 🙌
🚀 Features
GraphOS Enterprise: require_authentication
option to reject unauthenticated requests (Issue #2866)
While the authentication plugin validates queries with JWT, it does not reject unauthenticated requests, and leaves that to other layers. This allows co-processors to handle other authentication methods, and plugins at later layers to authorize the request or not. Typically, this was done in rhai.
This now adds an option to the Router's YAML configuration to reject unauthenticated requests. It can be used as follows:
authorization:
require_authentication: true
The plugin will check for the presence of the apollo_authentication::JWT::claims
key in the request context as proof that the request is authenticated.
🐛 Fixes
Prevent span attributes from being formatted to write logs
We do not show span attributes in our logs, but the log formatter still spends time formatting them to a string, even when there will be no logs written for the trace. This adds the NullFieldFormatter
that entirely avoids formatting the attributes to improve performance.
Federation v2.4.5
This release bumps the Router's Federation support from v2.4.2 to v2.4.5, which brings in notable query planner fixes from v2.4.3 and v2.4.5. Federation v2.4.4 will not exist due to a publishing failure. Of note from those releases, this brings query planner fixes that:
-
Improves the heuristics used to try to reuse the query named fragments in subgraph fetches. Said fragment will be reused (apollographql/federation#2541) more often, which can lead to smaller subgraph queries (and hence overall faster processing).
-
Fix potential assertion error during query planning in some multi-field
@requires
case. This error could be triggered (#2575) when a field in a@requires
depended on another field that was also part of that same requires (for instance, if a field has a@requires(fields: "id otherField")
and thatid
is also a key necessary to reach the subgraph providingotherField
).The assertion error thrown in that case contained the message
Root groups (...) should have no remaining groups unhandled (...)
Add support for throwing GraphQL errors in Rhai responses (Issue #3069)
It's possible to throw a GraphQL error from Rhai when processing a request. This extends the capability to include errors when processing a response.
Refer to the Terminating client requests section of the Rhai api documentation to learn how to throw GraphQL payloads.
Use a parking-lot mutex in Context
to avoid contention (Issue #2751)
Request context requires synchronized access to the busy timer, and previously we used a futures aware mutex for that, but those are susceptible to contention. This replaces that mutex with a parking-lot synchronous mutex that is much faster.
Config and schema reloads now use async IO (Issue #2613)
If you were using local schema or config then previously the Router was performing blocking IO in an async thread. This could have caused stalls to serving requests.
The Router now uses async IO for all config and schema reloads.
Fixing the above surfaced an issue with the experimental force_hot_reload
feature introduced for testing. This has also been fixed and renamed to force_reload
.
experimental_chaos:
- force_hot_reload: 1m
+ force_reload: 1m
By @BrynCooke in #3016
Improve subgraph coprocessor context processing (Issue #3058)
Each call to a subgraph co-processor could update the entire request context as a single operation. This is racy and could lead to difficult to predict context modifications depending on the order in which subgraph requests and responses are processed by the router.
This fix modifies the router so that subgraph co-processor context updates are merged within the existing context. This is still racy, but means that subgraphs are only racing to perform updates at the context key level, rather than across the entire context.
Future enhancements will provide a more comprehensive mechanism that will support some form of sequencing or change arbitration across subgraphs.
🛠 Maintenance
Add private component to the Context
structure (Issue #2800)
There's a cost in using the Context
structure during a request's lifecycle, due to JSON serialization and deserialization incurred when doing inter-plugin communication (e.g., between Rhai/coprocessors and Rust). For internal router usage, we now use a more efficient structure that avoids serialization costs of our private contextual properties which do not need to be exposed to plugins.
Adds an integration test for all YAML configuration files in ./examples
(Issue #2932)
Adds an integration test that iterates over ./examples
looking for .yaml
files that don't have a Cargo.toml
or .skipconfigvalidation
sibling file, and then running setup_router_and_registry
on them, fast failing on any errors along the way.
By @EverlastingBugstopper in #3097
Improve memory fragmentation and resource consumption by switching to jemalloc
as the memory allocator on Linux (PR #2882)
Detailed memory investigation revealed significant memory fragmentation when using the default allocator, glibc
, on Linux. Performance testing and flame-graph analysis suggested that using jemalloc
on Linux would yield notable performance improvements. In our tests, this figure shows performance to be about 35% faster than the default allocator, on account of spending less time managing memory fragmentation.
Not everyone will see a 35% performance improvement. Depending on your usage patterns, you may see more or less than this. If you see a regression, please file an issue with details.
We have no reason to believe that there are allocation problems on other platforms, so this change is confined to Linux.
Improve performance by avoiding temporary allocations creating response paths (PR #2854)
Response formatting generated many temporary allocations while creating response paths. By making a reference based type to hold these paths, we can prevent those allocations and improve performance.