OTEL bottleneck, ability to decide programmatically decide which spans we want to create #6542

samuelAndalon · 2025-01-14T00:01:45Z

Describe the solution you'd like

Apollo router provides the capability of high throughput, with very few instances the router can handle very large workloads, however, we have identified a bottleneck in Open Telemetry and the batch processor

the query_planning span wraps the whole query_planning services, which includes cache lookups, those takes microseconds, and as a result the router is exporting hundreds of thousands of spans causing the OpenTelemetry channel to not being able to send all spans that are actually created (bummer that OTEL does not implement backpressure)

for our use case, these spans are completely useless, they don't give us any useful insight and they just consume memory and CPU of the node where our router instance is running, and memory and CPU of the datadog agent that sends the spans to the datadog api/

Its not only query_planning there are other spans that are useless for our use cases:

execution
http_request
fetch
parallel
secuence

pretty much all query plan nodes.

Describe alternatives you've considered

We have tried multiple configurations for the batch_processor.

We have tried changing the log level of modules using the RUST_LOG env variable, which works for most of use cases, but it breaks the propagation of headers when changing the log level of apollo_router::services::http::service.

After doing some debugging found that the issue is here:

https://github.com/apollographql/router/blob/dev/apollo-router/src/services/http/service.rs#L276

the propagation of headers relies on the log level of modules. Which IMHO should be decoupled.

The text was updated successfully, but these errors were encountered:

BrynCooke · 2025-01-14T11:26:25Z

I think it would be possible to fix the propagation even if the http span is suppressed.

samuelAndalon added the raised by user label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OTEL bottleneck, ability to decide programmatically decide which spans we want to create #6542

OTEL bottleneck, ability to decide programmatically decide which spans we want to create #6542

samuelAndalon commented Jan 14, 2025 •

edited

Loading

BrynCooke commented Jan 14, 2025

OTEL bottleneck, ability to decide programmatically decide which spans we want to create #6542

OTEL bottleneck, ability to decide programmatically decide which spans we want to create #6542

Comments

samuelAndalon commented Jan 14, 2025 • edited Loading

Describe the solution you'd like

Describe alternatives you've considered

BrynCooke commented Jan 14, 2025

samuelAndalon commented Jan 14, 2025 •

edited

Loading