You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apollo router provides the capability of high throughput, with very few instances the router can handle very large workloads, however, we have identified a bottleneck in Open Telemetry and the batch processor
the query_planning span wraps the whole query_planning services, which includes cache lookups, those takes microseconds, and as a result the router is exporting hundreds of thousands of spans causing the OpenTelemetry channel to not being able to send all spans that are actually created (bummer that OTEL does not implement backpressure)
for our use case, these spans are completely useless, they don't give us any useful insight and they just consume memory and CPU of the node where our router instance is running, and memory and CPU of the datadog agent that sends the spans to the datadog api/
Its not only query_planning there are other spans that are useless for our use cases:
execution
http_request
fetch
parallel
secuence
pretty much all query plan nodes.
Describe alternatives you've considered
We have tried multiple configurations for the batch_processor.
We have tried changing the log level of modules using the RUST_LOG env variable, which works for most of use cases, but it breaks the propagation of headers when changing the log level of apollo_router::services::http::service.
After doing some debugging found that the issue is here:
Describe the solution you'd like
Apollo router provides the capability of high throughput, with very few instances the router can handle very large workloads, however, we have identified a bottleneck in Open Telemetry and the batch processor
the
query_planning
span wraps the whole query_planning services, which includes cache lookups, those takes microseconds, and as a result the router is exporting hundreds of thousands of spans causing the OpenTelemetry channel to not being able to send all spans that are actually created (bummer that OTEL does not implement backpressure)for our use case, these spans are completely useless, they don't give us any useful insight and they just consume memory and CPU of the node where our router instance is running, and memory and CPU of the datadog agent that sends the spans to the datadog api/
Its not only
query_planning
there are other spans that are useless for our use cases:pretty much all query plan nodes.
Describe alternatives you've considered
We have tried multiple configurations for the batch_processor.
We have tried changing the log level of modules using the
RUST_LOG
env variable, which works for most of use cases, but it breaks the propagation of headers when changing the log level ofapollo_router::services::http::service
.After doing some debugging found that the issue is here:
https://github.com/apollographql/router/blob/dev/apollo-router/src/services/http/service.rs#L276
the propagation of headers relies on the log level of modules. Which IMHO should be decoupled.
The text was updated successfully, but these errors were encountered: