-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Secondary index is ignored for joins involving compressed tables if there is a mismatch in where clause column and value types #5413
Comments
Hi @genedavis |
Hi @mahipv I'm not sure it's quite the same issue - that is a more general concern about allowing indexes on the compressed chunks. My concern is I can't query the UNCOMPRESSED chunks with the secondary index. Do you agree? Thx |
Hi @genedavis, Once a table is set for compression, TimescaleDB creates new secondary indexes to look up the relevant chunks. So even if the query is accessing the uncompressed chunks, since the hyper table is compressed, the old indexes defined on the hyper table are not used. https://docs.timescale.com/timescaledb/latest/how-to-guides/compression/about-compression/ |
@mahipv - I do not see that information in that link - I see reference to indexes on the compressed chunks - but you still have indexes on the uncompressed chunks. This falls back on (presumably) the query planner. That is supposed to be part of the "beauty" of the hypertable design: in near-term time frames, you get standard hyperchunk behavior, and in longer-term time frames, you get "columnar-like" query behavior. Just want to make sure we are on the same page. Thanks |
Related issue #5432 |
Hi @genedavis, Based on the steps you mentioned I have the following SQL script : -- Create system_tag with 300 tags and 150 ids
CREATE TABLE system_tag (system_id int NOT NULL, tag_name TEXT);
INSERT INTO system_tag
SELECT id % 150, 'tag_' || id FROM generate_series(1,300,1) id;
-- Create the tag_value table
CREATE TABLE tag_value (
tag_name TEXT,
tag_time timestamptz NOT NULL,
value float
);
-- Create the tag on tag_value(tag_name, tag_time, value)
CREATE INDEX tag_secondary_idx ON tag_value(tag_name);
-- Create hypertable
SELECT create_hypertable('tag_value', 'tag_time');
-- Populate it - time ranges between a month with interval every 2 hours
INSERT INTO tag_value
SELECT 'tag_' || id, time, ceil(random() * 100)::int
FROM generate_series(1,300,1) id,
generate_series('2023-03-01 0:00:00+0'::timestamptz,
'2023-04-01 0:00:00+0'::timestamptz,'2h') time;
-- Explain a SELECT on particular tag_time range
ANALYZE;
EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
WHERE s.tag_name = t.tag_name AND system_id = 50
AND tag_time > '2023-03-25 0:00:00+0'::timestamptz
AND tag_time < '2023-04-01 0:00:00+0'::timestamptz;
ALTER TABLE tag_value SET (timescaledb.compress);
-- Compress chunks that has the first 20 days data
SELECT compress_chunk(i, if_not_compressed => true)
FROM show_chunks('tag_value',
older_than => '2023-03-20 0:00:00+0'::timestamptz) i;
-- Show that the hypertable is only partially compressed
SELECT chunk_name, is_compressed FROM timescaledb_information.chunks
WHERE hypertable_name = 'tag_value';
-- Now retry the query again
ANALYZE;
EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
WHERE s.tag_name = t.tag_name AND system_id = 50
AND tag_time > '2023-03-25 0:00:00+0'::timestamptz
AND tag_time < '2023-04-01 0:00:00+0'::timestamptz; The same SELECT query is used both before and after compression, and the date range filters out only the last two chunks of the hypertable, which will always be uncompressed. i.e., The When I run this, the EXPLAIN command prints almost the same query plans before and after compression. In the both the cases, the query planner uses the secondary index Explain before compression :
Explain after compression :
Only when I adjust the tag_time WHERE filter to include compressed chunks, the index is ignored by the planner and I get a plan that takes more time to complete :
To summarise, the index is indeed used by the planner when you are looking for data that is only in the uncompressed chunks. Index is ignored only when the data you are interested in is in both the uncompressed and compressed chunks. Please let me know if this answers your question. Thank you! |
Thank you @lkshminarayanan for your thorough analysis. I will analyze your test and compare to my case. It seems I will need to provide more specific steps to reproduce. |
@lkshminarayanan I think I have an extra piece of information that may help reproduce: In our invocations, we are doing some date_trunc() math, which seems to make the difference. Using your example code:
This might actually give me a workaround I did not realize until now! |
Hi @genedavis, I was able to reproduce the issue. The Using EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
WHERE s.tag_name = t.tag_name AND system_id = 50
AND tag_time >= TIMESTAMPTZ '2023-03-28 00:00:00'
AND tag_time < TIMESTAMPTZ '2023-03-29 00:00:00';
But if you use EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
WHERE s.tag_name = t.tag_name AND system_id = 50
AND tag_time >= TIMESTAMP '2023-03-28 00:00:00'
AND tag_time < TIMESTAMP '2023-03-29 00:00:00';
And this happens only if the hypertable is compressed. For uncompressed hypertables, optimal plan is chosen for both Thank you for reporting this issue! |
Thanks again @lkshminarayanan - that all makes perfect sense. I am likely to file another issue slightly orthogonal, but something I believe is (eventually) manageable by timescale - when querying a table that has some compressed hyperchunks and with a secondary index on the non-compressed components (setting aside the issues above) a similar issue arises when the SQL is in a function definition - just an FYI. |
What type of bug is this?
Performance issue
What subsystems and features are affected?
Query planner
What happened?
See also - #2829 - it might be a duplicate of that. However that was querying compressed chunks and this is not.
Basic scenario:
This was all working very well, until I added a compression rule for data older than 30 days. Now the query performs a scan of the full hyperchunk based on time, instead of using the index. Queries for some groups of tags have gone from 10ms up to now 800ms.
Thanks - Gene
Edit: my workaround was to decompress all chunks and set compress to false on this table, which worked. However I would have hoped (maybe it's not easy) that the query planner would have been able to see that the query was bounded way outside of the compression time window and wouldn't have caused me to have to do this?
The query is a join of two tables: the hypertable and a system-tag mapping table that provides the appropriate tags to look up.
TimescaleDB version affected
2.7.2
PostgreSQL version used
14.5
What operating system did you use?
Ubuntu 22.04.1 LTS x64
What installation method did you use?
Docker
What platform did you run on?
Microsoft Azure Cloud
Relevant log output and stack trace
No response
How can we reproduce the bug?
The text was updated successfully, but these errors were encountered: