-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: inconsistent bad query plan #6900
Comments
The difference seems to be Nested Loop Join vs Hash Join. |
Yes. To eliminate variables, I wanted the data type to be consistent between the relative and absolute queries. In addition the underlying data type on the table is also timestamp without timezone.
Why would this be the case? If both the query and the table are using the same data type, why would there be casting involved?
It did not. https://explain.dalibo.com/plan/949a6301d7ch487e#raw
That appears to have done it (although now its deciding to play nice for a while even without that). And it appears that grafana (the main source of the queries) does allow transactions via explicit |
Usage of timestamp without timezone is discouraged in general in postgres. See https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29_to_store_UTC_times
now() returns timestamp with timezone so any constraint with now() will involve datatype conversions
Did you change the table definition as well. With the datatype being timestamp with timezone i would expect to move the chunk exclusion from execution time to planning time which should be a big improvement to planning time as well. The line |
That's why I had
No, that's a massive undertaking, as that involves changing a ton of data. However for the purposes of this issue, I've created a new table using |
Hi, I was just wondering if this issue had a formal resolution or do you need more assistance from us? Thanks, |
I'm not sure I follow. The issue is in the query planner. Or are you saying you don't intend to fix it? |
I was mostly wondering if you backfilled the data and if it fixed the issue for you. |
As long as all the time types involved in the query are |
@phemmer given that https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_timestamp_.28without_time_zone.29_to_store_UTC_times explicitly mentions that using |
The only reason given in that link is "This complicates many otherwise useful time calculations", implying that the only issue is that it makes things harder on the user. There is nothing about "this causes sub-optimal query planning and major deficiencies in performance". |
@phemmer we will add this to the list of issues to take a look at from our end. |
What type of bug is this?
Performance issue
What subsystems and features are affected?
Query planner
What happened?
I've been experiencing a lot of queries lately that inconsistently perform extremely bad. As in they vary between sub-second and 5+ minutes. I don't know whether they're all the same, but I dug into one of them which I'll use for this report.
I've tested a few slight variations of the query and have managed to produce seemingly insightful results. The basis of this particular query's issue seems to revolve around the time filter. If I use an absolute time range, the query always performs fine. If I use a large relative time range (large meaning 1h, relative meaning using
now()
), the query always performs fine. If I use a small relative time range (small meaning 5m), the query is inconsistent (sometimes <1s, sometimes >5m).Now I did run these on a live database, with data constantly changing, however after dozens of tries, the above information is always true. And the short+relative query can jump between good & bad from one second to the next. Sometimes it'll run good for only a few seconds before going bad again, and then sometimes it'll run good for many minutes.
So here's the queries (query + explain analyse output):
5m absolute time range
1h relative time range
5m relative time range - bad plan
5m relative time range - good plan
In the query examples, you can see that the good plans are all exactly the same, and the bad plan is substantially different. And even though the good plans involve parallel workers, that shouldn't explain the difference. We can see in the 5m+relative+good plan, that 7 workers were used. So to do it without parallelization, it should take <7x longer, but doesn't, and is instead 313x longer.
TimescaleDB version affected
2.14.2
PostgreSQL version used
16.2
What operating system did you use?
Debian 10
What installation method did you use?
Deb/Apt
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
Unfortunately I do not know how to reproduce it in a clean environment. I'll provide the basic setup info below, but please feel free to request any additional information and I'll provide it.
First
haproxy_server
is a view defined as:telegraf.haproxy_server_tag
is:telegraf.haproxy_server
is:The text was updated successfully, but these errors were encountered: