Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of query hashing by using a precomputed schema hash #6622

Merged
merged 41 commits into from
Jan 23, 2025

Conversation

IvanGoncharov
Copy link
Member

@IvanGoncharov IvanGoncharov commented Jan 22, 2025

This PR reverts the query hashing back to a simpler algorithm, which is faster and has more predictable CPU and memory costs.

Previous vs. New Algorithm

  • Both the old and new algorithms base their hash on the query, operation name, and supergraph schema.
  • This hash is critical in various parts of the Router (e.g., caching query plans and entities).

Key Change

  • The new algorithm always hashes the entire schema (as an SDL string).
  • The old algorithm tried to hash only the parts of the schema referenced by the query. While it allowed the hash to stay unchanged for irrelevant schema modifications, it introduced significant complexity and risk of collisions (partially fixed in Fix the query hashing algorithm #6205).
  • Those fixes increased CPU and memory usage (improved somewhat in Avoid re-computing implementers_map in QueryHashVisitor (1.59) #6569), revealing scalability issues for large queries and schemas.

Benefits of the New Approach

  • Simple, predictable, and less prone to collisions (limited by SHA256).
  • Since each query depends on the same supergraph schema, the algorithm computes the schema hash only once and reuses it.

Implementation Details

  • A new SchemaHash type has been introduced and the existing QueryHash type refined.
  • QueryHash is now created via SchemaHash::operation_hash with a query string and optional operation name.
  • This ensures the hash always includes the full schema, plus the full query and operation name.

Future Plan

  • This PR still allows some legacy shortcuts in QueryHash creation for now, but they will be removed in subsequent PRs.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

SimonSapin and others added 30 commits January 20, 2025 14:06
The `QueryHash` type was reused here, but it's not hashing a query at
all.
This is just to keep it compiling right now, we can remove it at the end
(it's probably right to do so)
Adding it to all the transitive usage sites is a lot of work.

Renamed it for clarity.
It would be better to make everything not-pub, but let's do that
separately.
It's vital that we populate `configuration.apollo` before we try to
populate `configuration.apollo.schema_id`.
@IvanGoncharov IvanGoncharov changed the title Improve performance of query hashing by using precomputed schema hash Improve performance of query hashing by using a precomputed schema hash Jan 23, 2025
@IvanGoncharov IvanGoncharov marked this pull request as ready for review January 23, 2025 00:43
@IvanGoncharov IvanGoncharov requested a review from a team as a code owner January 23, 2025 00:43
@@ -313,6 +313,8 @@ fn setup_from_mocks(
configuration: serde_json::Value,
mocks: &[(&'static str, &'static str)],
) -> TestHarness<'static> {
let _ = rustls::crypto::aws_lc_rs::default_provider().install_default();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goto-bus-stop This looks like an unintentional change introduced by the merge.
Is it correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests failed without it. We have to do this in several places after the hyper 1.0 upgrade on dev, but it's a bit weird that these tests were passing on dev before :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because tonic changed. We need to call this.

@@ -354,6 +354,8 @@ fn setup_from_mocks(
configuration: serde_json::Value,
mocks: &[(&'static str, &'static str)],
) -> TestHarness<'static> {
let _ = rustls::crypto::aws_lc_rs::default_provider().install_default();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goto-bus-stop looks like the same happened here

Comment on lines +507 to +510
// FIXME: It seems bad that you can create an empty hash easily and use it in security-critical
// places. This impl should be deleted outright and we should update usage sites.
// If the query hash is truly not required to contain data in those usage sites, we should use
// something like an Option instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems potentially quite bad. File an issue so we don’t forget it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact of fixing this now? Adding something that is a security risk doesn't sound good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrynCooke Actual call to default happening here:
https://github.com/apollographql/router/pull/6622/files#diff-7108edf02483a7de6d8c37b938df0634afc78590242b4fc94709c9dd2e626fb4R108
Previously, it was just less noticeable, and my change and @goto-bus-stop comments just make it stand out more.
Fixing it will significantly increase scope of this PR but @SimonSapin is right, we need to open an issue about it.
I will do that right now

apollo-router/src/query_planner/fetch.rs Outdated Show resolved Hide resolved
apollo-router/src/query_planner/tests.rs Show resolved Hide resolved
@IvanGoncharov IvanGoncharov merged commit 8c81327 into dev Jan 23, 2025
16 checks passed
@IvanGoncharov IvanGoncharov deleted the i1g/query_hash_cleanup branch January 23, 2025 15:43
@IvanGoncharov
Copy link
Member Author

@Mergifyio backport 1.59.2

Copy link
Contributor

mergify bot commented Jan 23, 2025

backport 1.59.2

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jan 23, 2025
…sh (#6622)

Co-authored-by: Simon Sapin <[email protected]>
Co-authored-by: Renée Kooi <[email protected]>
(cherry picked from commit 8c81327)

# Conflicts:
#	apollo-router/src/query_planner/bridge_query_planner.rs
#	apollo-router/src/query_planner/caching_query_planner.rs
#	apollo-router/src/query_planner/fetch.rs
#	apollo-router/src/query_planner/snapshots/apollo_router__query_planner__bridge_query_planner__tests__plan_root.snap
#	apollo-router/src/spec/query/change.rs
#	apollo-router/src/spec/schema.rs
#	apollo-router/tests/integration/redis.rs
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_disabled.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_generate_query_fragments.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_list_of_list.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_list_of_list_of_list.snap
#	apollo-router/tests/snapshots/type_conditions___test_type_conditions_enabled_shouldnt_make_article_fetch.snap
@BrynCooke BrynCooke added backport-1.x Backport this PR to 1.x and removed backport-1.x Backport this PR to 1.x labels Jan 23, 2025
@goto-bus-stop
Copy link
Member

@Mergifyio backport 1.x

Copy link
Contributor

mergify bot commented Jan 23, 2025

backport 1.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jan 23, 2025
…sh (#6622)

Co-authored-by: Simon Sapin <[email protected]>
Co-authored-by: Renée Kooi <[email protected]>
(cherry picked from commit 8c81327)

# Conflicts:
#	apollo-router/src/query_planner/fetch.rs
#	apollo-router/src/spec/schema.rs
@BrynCooke
Copy link
Contributor

@Mergifyio refresh

Copy link
Contributor

mergify bot commented Jan 23, 2025

refresh

✅ Pull request refreshed

abernix pushed a commit that referenced this pull request Jan 23, 2025
…sh (backport #6622) (#6631)

Co-authored-by: Ivan Goncharov <[email protected]>
Co-authored-by: Renée Kooi <[email protected]>
abernix pushed a commit that referenced this pull request Jan 24, 2025
…sh (backport #6622) (#6633)

Co-authored-by: Ivan Goncharov <[email protected]>
Co-authored-by: Renée Kooi <[email protected]>
@abernix abernix mentioned this pull request Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-1.x Backport this PR to 1.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants