Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(err): sparkline query is too expensive #29144

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

oliverb123
Copy link
Contributor

@oliverb123 oliverb123 commented Feb 24, 2025

Final query structure looks like the below. I've tested this in production, comparing its results to the previous designs for our team, and it looks right.

SELECT
    summary.issue_id AS id,
    summary.occurrences AS occurrences,
    summary.sessions AS sessions,
    summary.users AS users,
    summary.first_seen AS first_seen,
    summary.last_seen AS last_seen,
    if(greater(length(coalesce(cte_volumeDay.count, arrayMap(x -> 0, range(24)))), 0), coalesce(cte_volumeDay.count, arrayMap(x -> 0, range(24))), arrayMap(x -> 0, range(24))) AS volumeDay,
    if(greater(length(coalesce(cte_volumeMonth.count, arrayMap(x -> 0, range(31)))), 0), coalesce(cte_volumeMonth.count, arrayMap(x -> 0, range(31))), arrayMap(x -> 0, range(31))) AS volumeMonth,
    if(greater(length(coalesce(cte_customVolume.count, arrayMap(x -> 0, range(168)))), 0), coalesce(cte_customVolume.count, arrayMap(x -> 0, range(168))), arrayMap(x -> 0, range(168))) AS customVolume
FROM
    (SELECT
        issue_id AS issue_id,
        count(DISTINCT uuid) AS occurrences,
        count(DISTINCT nullIf($session_id, '')) AS sessions,
        count(DISTINCT distinct_id) AS users,
        max(timestamp) AS last_seen,
        min(timestamp) AS first_seen
    FROM
        events
    WHERE
        and(equals(event, '$exception'), isNotNull(issue_id), greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.741930')))
    GROUP BY
        issue_id) AS summary
    LEFT JOIN (SELECT
        inner.issue_id,
        groupArray(inner.count) AS count
    FROM
        (SELECT
            coalesce(ec.count, 0) AS count,
            s.diff AS diff,
            di.issue_id AS issue_id
        FROM
            (SELECT
                arrayJoin(range(24)) AS diff) AS s
            CROSS JOIN (SELECT
                DISTINCT issue_id AS issue_id
            FROM
                events
            WHERE
                and(equals(event, '$exception'), isNotNull(issue_id), and(1, greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.740957'))), greater(timestamp, minus(now(), toIntervalHour(25))))) AS di
            LEFT JOIN (SELECT
                count(uuid) AS count,
                dateDiff('hour', toStartOfHour(timestamp), toStartOfHour(now())) AS diff,
                issue_id AS issue_id
            FROM
                events
            WHERE
                and(equals(event, '$exception'), isNotNull(issue_id), and(1, greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.741204'))), greater(timestamp, minus(now(), toIntervalHour(25))))
            GROUP BY
                diff,
                issue_id
            HAVING
                less(diff, 24)) AS ec ON and(equals(s.diff, ec.diff), equals(di.issue_id, ec.issue_id))
        ORDER BY
            diff DESC) AS inner
    GROUP BY
        issue_id) AS cte_volumeDay ON equals(summary.issue_id, cte_volumeDay.issue_id)
    LEFT JOIN (SELECT
        inner.issue_id,
        groupArray(inner.count) AS count
    FROM
        (SELECT
            coalesce(ec.count, 0) AS count,
            s.diff AS diff,
            di.issue_id AS issue_id
        FROM
            (SELECT
                arrayJoin(range(31)) AS diff) AS s
            CROSS JOIN (SELECT
                DISTINCT issue_id AS issue_id
            FROM
                events
            WHERE
                and(equals(event, '$exception'), isNotNull(issue_id), and(1, greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.741450'))), greater(timestamp, minus(now(), toIntervalDay(32))))) AS di
            LEFT JOIN (SELECT
                count(uuid) AS count,
                dateDiff('day', toStartOfDay(timestamp), toStartOfDay(now())) AS diff,
                issue_id AS issue_id
            FROM
                events
            WHERE
                and(equals(event, '$exception'), isNotNull(issue_id), and(1, greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.741645'))), greater(timestamp, minus(now(), toIntervalDay(32))))
            GROUP BY
                diff,
                issue_id
            HAVING
                less(diff, 24)) AS ec ON and(equals(s.diff, ec.diff), equals(di.issue_id, ec.issue_id))
        ORDER BY
            diff DESC) AS inner
    GROUP BY
        issue_id) AS cte_volumeMonth ON equals(summary.issue_id, cte_volumeMonth.issue_id)
    LEFT JOIN (SELECT
        inner.issue_id,
        groupArray(inner.count) AS count
    FROM
        (SELECT
            coalesce(ec.count, 0) AS count,
            s.diff AS diff,
            di.issue_id AS issue_id
        FROM
            (SELECT
                arrayJoin(range(168)) AS diff) AS s
            CROSS JOIN (SELECT
                DISTINCT issue_id AS issue_id
            FROM
                events
            WHERE
                and(equals(event, '$exception'), isNotNull(issue_id), and(1, greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.741764'))), greater(timestamp, minus(now(), toIntervalHour(169))))) AS di
            LEFT JOIN (SELECT
                count(uuid) AS count,
                dateDiff('hour', toStartOfHour(timestamp), toStartOfHour(now())) AS diff,
                issue_id AS issue_id
            FROM
                events
            WHERE
                and(equals(event, '$exception'), isNotNull(issue_id), and(1, greaterOrEquals(timestamp, toDateTime('2025-02-18 15:48:25.741821'))), greater(timestamp, minus(now(), toIntervalHour(169))))
            GROUP BY
                diff,
                issue_id
            HAVING
                less(diff, 24)) AS ec ON and(equals(s.diff, ec.diff), equals(di.issue_id, ec.issue_id))
        ORDER BY
            diff DESC) AS inner
    GROUP BY
        issue_id) AS cte_customVolume ON equals(summary.issue_id, cte_customVolume.issue_id)
ORDER BY
    last_seen DESC
LIMIT 51
OFFSET 0

@posthog-bot
Copy link
Contributor

Hey @oliverb123! 👋
This pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR optimizes error tracking sparkline queries by implementing Common Table Expressions (CTEs) for more efficient volume calculations and event counting.

  • Refactored ErrorTrackingQueryRunner in /posthog/hogql_queries/error_tracking_query_runner.py to use CTEs for better query performance
  • Added predefined sparkline configurations for day/month views with configurable intervals
  • Added new test file /posthog/management/commands/test_error_tracking_query_runner.py but needs improvement in test coverage and assertions
  • Implemented cross-joins with CTEs to efficiently calculate event counts across time intervals
  • Added proper handling of custom volume configurations through sparkLineConfigs dictionary

2 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

Comment on lines 48 to 51
self.sparkLineConfigs = {
"volumeDay": ErrorTrackingSparklineConfig(interval=Interval.HOUR, value=24),
"volumeMonth": ErrorTrackingSparklineConfig(interval=Interval.DAY, value=31)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: inconsistent naming: sparkLineConfigs vs sparklineConfigs (camelCase)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daibhin 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIs fight just as much as developers. Greptile is a tabs man, Cursor is spaces

@oliverb123
Copy link
Contributor Author

Going to take another swing at this form a perf perspective

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants