Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: query metadata to return index usage #29230

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

orian
Copy link
Contributor

@orian orian commented Feb 26, 2025

Problem

No way to understand why queries are slow

Changes

Added isUsingIndices to HogQL Metadata Response.

Doing a full check is quite a complex task, what is done here is a verification that there are at least some conditions on primary keys.

Screenshot 2025-02-26 at 01 32 02

Does this work well for both Cloud and self-hosted?

Yes.

How did you test this code?

Have tests. Run locally.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Added a new feature to analyze and report index usage in ClickHouse queries, helping users identify why queries might be slow by examining execution plans.

  • Added QueryIndexUsage enum with values "undecisive", "no", "partial", and "yes" to both frontend schema and backend schema.py
  • Implemented core logic in posthog/clickhouse/explain.py to analyze ClickHouse query plans and determine index usage effectiveness
  • Extended HogQLMetadataResponse interface with isUsingIndices property to expose index usage information to users
  • Added specialized handling for specific tables like 'sharded_events' and 'person_distinct_id_overrides' with custom index usage detection
  • Created comprehensive test cases in test_explain.py covering proper indexing, missing timestamp conditions, and complex production queries

6 file(s) reviewed, 3 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +11 to +12
for subplan in plan.get("Plans", []):
reads = reads + find_all_reads(subplan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using list concatenation with the += operator instead of creating a new list each time. This would be more efficient for large query plans.

Comment on lines +64 to +81
result["use"] = QueryIndexUsage.UNDECISIVE
minMax = False
partition = False
primary_key = False
for index in indexes:
if index.get("Condition", "") == "true":
continue
index_type = index.get("Type", "")
if index_type == "MinMax":
minMax = selected_less_granules(index)
elif index_type == "Partition":
partition = selected_less_granules(index)
elif index_type == "PrimaryKey":
primary_key = len(index.get("Keys", [])) > 1 and selected_less_granules(index)
if (minMax or partition) and primary_key:
result["use"] = QueryIndexUsage.YES

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: This default case logic duplicates the code from the 'sharded_events' case. Consider refactoring to avoid duplication by extracting this logic into a separate function.

Comment on lines +88 to +99
try:
explain_results = sync_execute(
f"EXPLAIN PLAN indexes=1,json=1 {clickhouse_sql}",
context.values,
with_column_types=True,
# workload=workload,
team_id=team.pk,
readonly=True,
)
response.isUsingIndices = extract_index_usage_from_plan(explain_results[0][0][0])
except:
response.isUsingIndices = QueryIndexUsage.UNDECISIVE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The bare except block silently catches all exceptions and sets the index usage to UNDECISIVE. Consider catching specific exceptions or at least logging the error for debugging purposes.

Copy link
Contributor

Size Change: 0 B

Total Size: 9.72 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 9.72 MB

compressed-size-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant