Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve initial loading of persons page #29331

Draft
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented Feb 27, 2025

Problem

Don't merge until #29319 is merged

Loading the persons page is slow. This will speed it up by not aggregating all persons just to return 100 results

Changes

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.460. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@timgl timgl changed the title Speed up person loading feat: Improve initial loading of persons page Feb 27, 2025
@timgl timgl marked this pull request as draft February 27, 2025 23:43
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Speed up person loading with optimized query execution

This PR introduces a significant performance optimization for loading the persons page by implementing a new query strategy that avoids unnecessary aggregation of all persons when only a limited number of results are needed.

  • Added LimitByExpr class to HogQL AST and modified the grammar to properly support LIMIT BY clauses in ClickHouse SQL
  • Implemented PersonsArgMaxVersion.V3 optimization that pushes limit, offset, and ordering into inner queries for more efficient execution
  • Fixed SQL clause ordering in printer.py to ensure LIMIT BY appears before LIMIT to match ClickHouse syntax
  • Updated visitor pattern implementation to properly handle the new LimitByExpr nodes throughout the codebase
  • Added comprehensive tests to verify the optimization works correctly with various query patterns

28 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines +104 to +106
inner_select.offset = clone_expr(node.offset, clear_locations=True, clear_types=True)
# Note: we actually change the outer node's behaviour here. This only works because in the if statement above we check whether the query is a 'simple' query (ie no joins/not in a subquery). This would mess up if we tried this with ie events
node.offset = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Setting node.offset to None could have unexpected side effects if this code is ever used in a context where the outer query's offset is still needed after this function returns

Comment on lines +109 to +115
inner_select_where.append(
ast.CompareOperation(
left=ast.Field(chain=["where_optimization", "id"]),
right=parse_select("SELECT id FROM raw_persons as limit_delete_optimization WHERE is_deleted = 1"),
op=ast.CompareOperationOp.NotIn,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using a subquery here could be inefficient if there are many deleted persons. Consider using a join or EXISTS clause instead

Comment on lines +130 to +159
@snapshot_clickhouse_queries
def test_limit_and_order_by(self):
response = execute_hogql_query(
parse_select("select id, properties.$some_prop, created_at from persons ORDER BY created_at limit 3"),
self.team,
)
assert response.clickhouse
self.assertIn("where_optimization", response.clickhouse)
assert [x[0] for x in response.results] == [
self.first_person.uuid,
self.second_person.uuid,
self.third_person.uuid,
]

response = execute_hogql_query(
parse_select("select id, properties.$some_prop from persons ORDER BY created_at limit 2, 1"),
self.team,
)
assert [x[0] for x in response.results] == [self.second_person.uuid, self.third_person.uuid]

_create_event(event="$pageview", distinct_id="1", team=self.team)
_create_event(event="$pageview", distinct_id="2", team=self.team)
_create_event(event="$pageview", distinct_id="3", team=self.team)
response = execute_hogql_query(
parse_select(
"select id, persons.properties.$some_prop from events left join persons ON (events.person_id=persons.id) where persons.properties.$some_prop != 'something' limit 1"
),
self.team,
)
assert len(response.results) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The test verifies that the optimization works correctly with ORDER BY and LIMIT clauses, but there's a potential issue in the second assertion. The test is checking that the result contains both the second_person and third_person UUIDs, but the query uses 'limit 2, 1' which should return only 1 result (offset 2, limit 1). The assertion should probably check for just one UUID, not two.

Comment on lines +150 to +152
_create_event(event="$pageview", distinct_id="1", team=self.team)
_create_event(event="$pageview", distinct_id="2", team=self.team)
_create_event(event="$pageview", distinct_id="3", team=self.team)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: These three lines create duplicate events with the same parameters as lines 70-72. This could lead to confusion about the test data state. Consider consolidating these event creations or adding a comment explaining why they're duplicated.

Copy link
Contributor

github-actions bot commented Feb 27, 2025

Size Change: 0 B

Total Size: 9.73 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 9.73 MB

compressed-size-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants