Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of query hashing by using a precomputed schema hash (backport #6622) #6631

Merged
merged 5 commits into from
Jan 23, 2025

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jan 23, 2025

This PR reverts the query hashing back to a simpler algorithm, which is faster and has more predictable CPU and memory costs.

Previous vs. New Algorithm

  • Both the old and new algorithms base their hash on the query, operation name, and supergraph schema.
  • This hash is critical in various parts of the Router (e.g., caching query plans and entities).

Key Change

  • The new algorithm always hashes the entire schema (as an SDL string).
  • The old algorithm tried to hash only the parts of the schema referenced by the query. While it allowed the hash to stay unchanged for irrelevant schema modifications, it introduced significant complexity and risk of collisions (partially fixed in Fix the query hashing algorithm #6205).
  • Those fixes increased CPU and memory usage (improved somewhat in Avoid re-computing implementers_map in QueryHashVisitor (1.59) #6569), revealing scalability issues for large queries and schemas.

Benefits of the New Approach

  • Simple, predictable, and less prone to collisions (limited by SHA256).
  • Since each query depends on the same supergraph schema, the algorithm computes the schema hash only once and reuses it.

Implementation Details

  • A new SchemaHash type has been introduced and the existing QueryHash type refined.
  • QueryHash is now created via SchemaHash::operation_hash with a query string and optional operation name.
  • This ensures the hash always includes the full schema, plus the full query and operation name.

Future Plan

  • This PR still allows some legacy shortcuts in QueryHash creation for now, but they will be removed in subsequent PRs.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

[ROUTER-978]: https://apollographql.atlassian.net/browse/ROUTER-978?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ


This is an automatic backport of pull request #6622 done by Mergify.

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

…sh (#6622)

Co-authored-by: Simon Sapin <[email protected]>
Co-authored-by: Renée Kooi <[email protected]>
(cherry picked from commit 8c81327)

# Conflicts:
#	apollo-router/src/query_planner/bridge_query_planner.rs
#	apollo-router/src/query_planner/caching_query_planner.rs
#	apollo-router/src/query_planner/fetch.rs
#	apollo-router/src/query_planner/snapshots/apollo_router__query_planner__bridge_query_planner__tests__plan_root.snap
#	apollo-router/src/spec/query/change.rs
#	apollo-router/src/spec/schema.rs
#	apollo-router/tests/integration/redis.rs
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_disabled.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_generate_query_fragments.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_list_of_list.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_list_of_list_of_list.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_shouldnt_make_article_fetch.snap
@mergify mergify bot requested review from a team as code owners January 23, 2025 15:45

This comment was marked as resolved.

@apollo-cla
Copy link

@mergify[bot]: Thank you for submitting a pull request! Before we can merge it, you'll need to sign the Apollo Contributor License Agreement here: https://contribute.apollographql.com/

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Jan 23, 2025

⚠️ Docs preview not attached to branch

The preview was not built because the PR's base branch 1.59.2 is not in the list of sources.

An Apollo team member can comment one of the following commands to dictate which branch to attach the preview to:

  • !docs set-base-branch dev
  • !docs set-base-branch 1.x

Build ID: 201aa89499c874e52db8a1e9

@router-perf
Copy link

router-perf bot commented Jan 23, 2025

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@abernix abernix merged commit 7161c2e into 1.59.2 Jan 23, 2025
15 checks passed
@abernix abernix deleted the mergify/bp/1.59.2/pr-6622 branch January 23, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants