feat: query metadata to return index usage #29230

orian · 2025-02-26T00:37:54Z

Problem

No way to understand why queries are slow

Changes

Added isUsingIndices to HogQL Metadata Response.

Doing a full check is quite a complex task, what is done here is a verification that there are at least some conditions on primary keys.

Does this work well for both Cloud and self-hosted?

Yes.

How did you test this code?

Have tests. Run locally.

greptile-apps

PR Summary

Added a new feature to analyze and report index usage in ClickHouse queries, helping users identify why queries might be slow by examining execution plans.

Added QueryIndexUsage enum with values "undecisive", "no", "partial", and "yes" to both frontend schema and backend schema.py
Implemented core logic in posthog/clickhouse/explain.py to analyze ClickHouse query plans and determine index usage effectiveness
Extended HogQLMetadataResponse interface with isUsingIndices property to expose index usage information to users
Added specialized handling for specific tables like 'sharded_events' and 'person_distinct_id_overrides' with custom index usage detection
Created comprehensive test cases in test_explain.py covering proper indexing, missing timestamp conditions, and complex production queries

_{6 file(s) reviewed, 3 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2025-02-26T00:39:12Z

posthog/clickhouse/explain.py

+    for subplan in plan.get("Plans", []):
+        reads = reads + find_all_reads(subplan)


style: Consider using list concatenation with the += operator instead of creating a new list each time. This would be more efficient for large query plans.

greptile-apps · 2025-02-26T00:39:13Z

posthog/clickhouse/explain.py

+    result["use"] = QueryIndexUsage.UNDECISIVE
+    minMax = False
+    partition = False
+    primary_key = False
+    for index in indexes:
+        if index.get("Condition", "") == "true":
+            continue
+        index_type = index.get("Type", "")
+        if index_type == "MinMax":
+            minMax = selected_less_granules(index)
+        elif index_type == "Partition":
+            partition = selected_less_granules(index)
+        elif index_type == "PrimaryKey":
+            primary_key = len(index.get("Keys", [])) > 1 and selected_less_granules(index)
+    if (minMax or partition) and primary_key:
+        result["use"] = QueryIndexUsage.YES
+
+    return result


style: This default case logic duplicates the code from the 'sharded_events' case. Consider refactoring to avoid duplication by extracting this logic into a separate function.

greptile-apps · 2025-02-26T00:39:31Z

posthog/hogql/metadata.py

+            try:
+                explain_results = sync_execute(
+                    f"EXPLAIN PLAN indexes=1,json=1 {clickhouse_sql}",
+                    context.values,
+                    with_column_types=True,
+                    # workload=workload,
+                    team_id=team.pk,
+                    readonly=True,
+                )
+                response.isUsingIndices = extract_index_usage_from_plan(explain_results[0][0][0])
+            except:
+                response.isUsingIndices = QueryIndexUsage.UNDECISIVE


style: The bare except block silently catches all exceptions and sets the index usage to UNDECISIVE. Consider catching specific exceptions or at least logging the error for debugging purposes.

github-actions · 2025-02-26T00:50:13Z

Size Change: 0 B

Total Size: 9.72 MB

ℹ️ View Unchanged

Filename	Size
`frontend/dist/toolbar.js`	9.72 MB

_{compressed-size-action}

orian added 6 commits February 25, 2025 12:03

add query use index to API

ee79dcd

revert unrelated api changes

a7b44d7

revert unrelated api changes

4c181ed

put index usage in metadata response

94278f9

add parsing the response

b6c1e0e

add tests and guesstimation

acd5a7f

greptile-apps bot reviewed Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: query metadata to return index usage #29230

feat: query metadata to return index usage #29230

orian commented Feb 26, 2025 •

edited

Loading

greptile-apps bot left a comment

greptile-apps bot Feb 26, 2025

greptile-apps bot Feb 26, 2025

greptile-apps bot Feb 26, 2025

github-actions bot commented Feb 26, 2025

		for subplan in plan.get("Plans", []):
		reads = reads + find_all_reads(subplan)

feat: query metadata to return index usage #29230

Are you sure you want to change the base?

feat: query metadata to return index usage #29230

Conversation

orian commented Feb 26, 2025 • edited Loading

Problem

Changes

Does this work well for both Cloud and self-hosted?

How did you test this code?

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Feb 26, 2025

Choose a reason for hiding this comment

greptile-apps bot Feb 26, 2025

Choose a reason for hiding this comment

greptile-apps bot Feb 26, 2025

Choose a reason for hiding this comment

github-actions bot commented Feb 26, 2025

orian commented Feb 26, 2025 •

edited

Loading