Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Enforce hypertable usage with chunk exclusion for better performance #7745

Open
tpetry opened this issue Feb 19, 2025 · 2 comments
Labels
feature-request Feature proposal

Comments

@tpetry
Copy link

tpetry commented Feb 19, 2025

What problem does the new feature solve?

Timescale works great when you keep time ranges small or at least use a range so not all chunks are used. But sometimes a developer writes a query like this:

SELECT COUNT(*) FROM hypertbl WHERE col = 'val'

The impact? All chunks are scanned which is no problem when the application is developed and not tested on the entire data. But once the application moves to production, you're presented with terrible query performance. One of my clients even had this problem only after 6 months of using Timescale. They did know the exact details how Timescale works and used it like an improved PostgreSQL table.

What does the feature do?

I would love to see a setting in Timescale to prevent these issues:

SET timescaledb.fail_on_missing_chunk_exclusion = on

So basically you can now tell Timescale to immediately fail a query when:

  • The partioning column is not used in the query so there's no chunk exclusion
    • Case 1: The column is really not used
    • Case 2: The column is used in a way that prevents chunk exclusion, e.g. col + 1 between :min and :max
  • Chunk skipping is not used in the query
    • Case 1: No chunk skipping column is used
    • Case 2: The column is used in a way that prevents chunk skipping, e.g. col + 1 = 33

So basically, any query where the query planner is unable to apply any chunk exclusion logic should result in an error to prevent Timescale using in an inefficient way. This setting would not impact queries that target all chunks (e.g. WHERE col BETWEEN NOW() - INTERVAL '1000 years' AND NOW() - INTERVAL '1000 years'). These will still be inefficient, but its clear that they will scan all chunks. The intention is to block only queries where the query planner can't apply an query exclusion logic because neither the range column nor chunk skipping is used.

And when you really want to execute the slow query from the top you can either (1) temporarily disable the setting or (2) add a time range matching everything.

Implementation challenges

No response

@tpetry tpetry added the feature-request Feature proposal label Feb 19, 2025
@jamessewell
Copy link
Contributor

I think this is a great feature!

Another implementation could be not to fail, but to NOTICE or WARN.

@tpetry
Copy link
Author

tpetry commented Feb 27, 2025

I don't know of any database driver in a programming language that even looks at NOTICE or WARN client messages - only interactive tools. But those levels could be interesting when running within Timescale Cloud: They query would still run and the Insights page within a Timescale service could warn about the efficiency problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Feature proposal
Projects
None yet
Development

No branches or pull requests

2 participants