Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Secondary index is ignored for joins involving compressed tables if there is a mismatch in where clause column and value types #5413

Open
genedavis opened this issue Mar 8, 2023 · 10 comments

Comments

@genedavis
Copy link

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query planner

What happened?

See also - #2829 - it might be a duplicate of that. However that was querying compressed chunks and this is not.

Basic scenario:

  • I have a large hypertable to look up iot-based data (tag, time, value, etc)
  • I create a index on that table starting with tag since that is a common initial lookup filter
  • I Run a query collecting data for a window of the past few hours of data for a group of tags

This was all working very well, until I added a compression rule for data older than 30 days. Now the query performs a scan of the full hyperchunk based on time, instead of using the index. Queries for some groups of tags have gone from 10ms up to now 800ms.

Thanks - Gene

Edit: my workaround was to decompress all chunks and set compress to false on this table, which worked. However I would have hoped (maybe it's not easy) that the query planner would have been able to see that the query was bounded way outside of the compression time window and wouldn't have caused me to have to do this?

The query is a join of two tables: the hypertable and a system-tag mapping table that provides the appropriate tags to look up.

TimescaleDB version affected

2.7.2

PostgreSQL version used

14.5

What operating system did you use?

Ubuntu 22.04.1 LTS x64

What installation method did you use?

Docker

What platform did you run on?

Microsoft Azure Cloud

Relevant log output and stack trace

No response

How can we reproduce the bug?

see: https://github.com/timescale/timescaledb/issues/2829 - though this query hits the compressed chunks, and mine fails when you query the uncompressed chunks.

Basic steps:

1. create tag_value hypertable (tag_name, tag_time, value)
2. create secondary (covering) index on tag_name, tag_time, value
3. create a system_tag table (system_id, tag_name) that has one->many mappings between systems and tags
3. populate sample data in hypertable
4. populate simple data into system table (mapping a system_id to maybe two tag_names)
5. run a query where you are looking up data in the tag_value hypertable in a specific time range (say last few hours where there are a few million records), but only for tags that match a certain system in the system_tag table - it should run fast because it uses the covering secondary index
6. Add a compression rule to compress data more than 30 days out
7. Repeat step 5 above - now it's slow because it's using the regular tag_time index instead of the secondary index
@genedavis genedavis added the bug label Mar 8, 2023
@genedavis genedavis changed the title [Bug]: Tables with compression defined on them don't leverage indexes [Bug]: Tables with compression defined on them don't leverage secondary indexes Mar 8, 2023
@mahipv
Copy link
Contributor

mahipv commented Mar 8, 2023

Hi @genedavis
Thank you for the detailed steps in reproducing the issue. This is a feature request is duplicate of the issue tracked by #2418
Can I mark this as a duplicate and close?

@genedavis
Copy link
Author

Hi @mahipv I'm not sure it's quite the same issue - that is a more general concern about allowing indexes on the compressed chunks. My concern is I can't query the UNCOMPRESSED chunks with the secondary index. Do you agree? Thx

@mahipv
Copy link
Contributor

mahipv commented Mar 10, 2023

Hi @genedavis,

Once a table is set for compression, TimescaleDB creates new secondary indexes to look up the relevant chunks. So even if the query is accessing the uncompressed chunks, since the hyper table is compressed, the old indexes defined on the hyper table are not used.

https://docs.timescale.com/timescaledb/latest/how-to-guides/compression/about-compression/

@genedavis
Copy link
Author

@mahipv - I do not see that information in that link - I see reference to indexes on the compressed chunks - but you still have indexes on the uncompressed chunks. This falls back on (presumably) the query planner. That is supposed to be part of the "beauty" of the hypertable design: in near-term time frames, you get standard hyperchunk behavior, and in longer-term time frames, you get "columnar-like" query behavior.

Just want to make sure we are on the same page. Thanks

@lkshminarayanan
Copy link
Contributor

Related issue #5432

@lkshminarayanan
Copy link
Contributor

Hi @genedavis,

Based on the steps you mentioned I have the following SQL script :

-- Create system_tag with 300 tags and 150 ids
CREATE TABLE system_tag (system_id int NOT NULL, tag_name TEXT);
INSERT INTO system_tag
  SELECT id % 150, 'tag_' || id FROM generate_series(1,300,1) id;

-- Create the tag_value table
CREATE TABLE tag_value (
  tag_name TEXT,
  tag_time timestamptz NOT NULL,
  value float
);

-- Create the tag on tag_value(tag_name, tag_time, value)
CREATE INDEX tag_secondary_idx ON tag_value(tag_name);

-- Create hypertable
SELECT create_hypertable('tag_value', 'tag_time');

-- Populate it - time ranges between a month with interval every 2 hours
INSERT INTO tag_value
  SELECT 'tag_' || id, time, ceil(random() * 100)::int
    FROM generate_series(1,300,1) id,
       generate_series('2023-03-01 0:00:00+0'::timestamptz,
                       '2023-04-01 0:00:00+0'::timestamptz,'2h') time;

-- Explain a SELECT on particular tag_time range
ANALYZE;
EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time > '2023-03-25 0:00:00+0'::timestamptz
    AND tag_time < '2023-04-01 0:00:00+0'::timestamptz;

ALTER TABLE tag_value SET (timescaledb.compress);
-- Compress chunks that has the first 20 days data
SELECT compress_chunk(i, if_not_compressed => true)
    FROM show_chunks('tag_value',
      older_than => '2023-03-20 0:00:00+0'::timestamptz) i;

-- Show that the hypertable is only partially compressed
SELECT chunk_name, is_compressed FROM timescaledb_information.chunks
  WHERE hypertable_name = 'tag_value';

-- Now retry the query again
ANALYZE;
EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time > '2023-03-25 0:00:00+0'::timestamptz
    AND tag_time < '2023-04-01 0:00:00+0'::timestamptz;

The same SELECT query is used both before and after compression, and the date range filters out only the last two chunks of the hypertable, which will always be uncompressed. i.e., The compress_chunk function compresses the first 20 days data and the SELECT query is interested in only the last 5 days.

When I run this, the EXPLAIN command prints almost the same query plans before and after compression. In the both the cases, the query planner uses the secondary index tag_secondary_idx (i.e. *_chunk_tag_secondary_idx on the uncompressed chunks).

Explain before compression :

                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.93..356.01 rows=166 width=34) (actual time=0.067..1.255 rows=166 loops=1)
   ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.018..0.056 rows=2 loops=1)
         Filter: (system_id = 50)
         Rows Removed by Filter: 298
   ->  Append  (cost=4.93..174.30 rows=83 width=23) (actual time=0.035..0.378 rows=83 loops=2)
         ->  Bitmap Heap Scan on _hyper_1_5_chunk t_1  (cost=4.93..131.32 rows=59 width=23) (actual time=0.031..0.114 rows=59 loops=2)
               Recheck Cond: (tag_name = s.tag_name)
               Filter: ((tag_time > '2023-03-25 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
               Rows Removed by Filter: 25
               Heap Blocks: exact=3
               ->  Bitmap Index Scan on _hyper_1_5_chunk_tag_secondary_idx  (cost=0.00..4.92 rows=84 width=0) (actual time=0.014..0.014 rows=84 loops=2)
                     Index Cond: (tag_name = s.tag_name)
         ->  Bitmap Heap Scan on _hyper_1_6_chunk t_2  (cost=4.48..42.56 rows=24 width=23) (actual time=0.016..0.051 rows=24 loops=2)
               Recheck Cond: (tag_name = s.tag_name)
               Filter: ((tag_time > '2023-03-25 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
               Rows Removed by Filter: 1
               Heap Blocks: exact=2
               ->  Bitmap Index Scan on _hyper_1_6_chunk_tag_secondary_idx  (cost=0.00..4.47 rows=25 width=0) (actual time=0.009..0.010 rows=25 loops=2)
                     Index Cond: (tag_name = s.tag_name)
 Planning Time: 1.683 ms
 Execution Time: 1.534 ms
(21 rows)

Explain after compression :

                                                                            QUERY PLAN                                                                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.93..356.01 rows=166 width=34) (actual time=0.070..1.281 rows=166 loops=1)
   ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.018..0.057 rows=2 loops=1)
         Filter: (system_id = 50)
         Rows Removed by Filter: 298
   ->  Append  (cost=4.93..174.30 rows=83 width=23) (actual time=0.038..0.387 rows=83 loops=2)
         ->  Bitmap Heap Scan on _hyper_1_5_chunk t_1  (cost=4.93..131.32 rows=59 width=23) (actual time=0.033..0.118 rows=59 loops=2)
               Recheck Cond: (tag_name = s.tag_name)
               Filter: ((tag_time > '2023-03-25 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
               Rows Removed by Filter: 25
               Heap Blocks: exact=3
               ->  Bitmap Index Scan on _hyper_1_5_chunk_tag_secondary_idx  (cost=0.00..4.92 rows=84 width=0) (actual time=0.014..0.015 rows=84 loops=2)
                     Index Cond: (tag_name = s.tag_name)
         ->  Bitmap Heap Scan on _hyper_1_6_chunk t_2  (cost=4.48..42.56 rows=24 width=23) (actual time=0.016..0.051 rows=24 loops=2)
               Recheck Cond: (tag_name = s.tag_name)
               Filter: ((tag_time > '2023-03-25 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
               Rows Removed by Filter: 1
               Heap Blocks: exact=2
               ->  Bitmap Index Scan on _hyper_1_6_chunk_tag_secondary_idx  (cost=0.00..4.47 rows=25 width=0) (actual time=0.009..0.010 rows=25 loops=2)
                     Index Cond: (tag_name = s.tag_name)
 Planning Time: 1.156 ms
 Execution Time: 1.543 ms
(21 rows)

Only when I adjust the tag_time WHERE filter to include compressed chunks, the index is ignored by the planner and I get a plan that takes more time to complete :

postgres=# EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time > '2023-03-10 0:00:00+0'::timestamptz
    AND tag_time < '2023-04-01 0:00:00+0'::timestamptz;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=5.84..1947.47 rows=531 width=34) (actual time=0.914..421.389 rows=526 loops=1)
   Hash Cond: (t_1.tag_name = s.tag_name)
   ->  Append  (cost=0.06..1637.89 rows=79600 width=23) (actual time=0.173..313.920 rows=78900 loops=1)
         ->  Custom Scan (DecompressChunk) on _hyper_1_3_chunk t_1  (cost=0.06..1.39 rows=22000 width=23) (actual time=0.169..34.278 rows=21300 loops=1)
               Filter: ((tag_time > '2023-03-10 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
               Rows Removed by Filter: 700
               ->  Seq Scan on compress_hyper_2_9_chunk  (cost=0.00..1.39 rows=22 width=116) (actual time=0.011..0.060 rows=22 loops=1)
                     Filter: ((_ts_meta_max_1 > '2023-03-10 05:30:00+05:30'::timestamp with time zone) AND (_ts_meta_min_1 < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
                     Rows Removed by Filter: 4
         ->  Seq Scan on _hyper_1_4_chunk t_2  (cost=0.00..539.00 rows=25200 width=23) (actual time=0.015..35.980 rows=25200 loops=1)
               Filter: ((tag_time > '2023-03-10 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
         ->  Seq Scan on _hyper_1_5_chunk t_3  (cost=0.00..539.00 rows=25200 width=23) (actual time=0.012..35.877 rows=25200 loops=1)
               Filter: ((tag_time > '2023-03-10 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
         ->  Seq Scan on _hyper_1_6_chunk t_4  (cost=0.00..160.50 rows=7200 width=23) (actual time=0.011..10.277 rows=7200 loops=1)
               Filter: ((tag_time > '2023-03-10 05:30:00+05:30'::timestamp with time zone) AND (tag_time < '2023-04-01 05:30:00+05:30'::timestamp with time zone))
               Rows Removed by Filter: 300
   ->  Hash  (cost=5.75..5.75 rows=2 width=11) (actual time=0.061..0.067 rows=2 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.015..0.053 rows=2 loops=1)
               Filter: (system_id = 50)
               Rows Removed by Filter: 298
 Planning Time: 2.137 ms
 Execution Time: 422.176 ms
(23 rows)

Time: 436.263 ms

To summarise, the index is indeed used by the planner when you are looking for data that is only in the uncompressed chunks. Index is ignored only when the data you are interested in is in both the uncompressed and compressed chunks.

Please let me know if this answers your question. Thank you!

@genedavis
Copy link
Author

Thank you @lkshminarayanan for your thorough analysis. I will analyze your test and compare to my case. It seems I will need to provide more specific steps to reproduce.

@genedavis
Copy link
Author

genedavis commented Mar 23, 2023

@lkshminarayanan I think I have an extra piece of information that may help reproduce: In our invocations, we are doing some date_trunc() math, which seems to make the difference. Using your example code:

tsdb=# EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time > now() - interval '300 minutes' and tag_time < now();
                                                                                               QUERY PLAN                                                                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=16.24..50.02 rows=6 width=34) (actual time=0.083..0.154 rows=6 loops=1)
   ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.013..0.032 rows=2 loops=1)
         Filter: (system_id = 50)
         Rows Removed by Filter: 298
   ->  Custom Scan (ChunkAppend) on tag_value t  (cost=16.24..22.09 rows=4 width=23) (actual time=0.056..0.058 rows=3 loops=2)
         Chunks excluded during startup: 1
         ->  Bitmap Heap Scan on _hyper_13_216_chunk t_1  (cost=16.24..19.58 rows=3 width=23) (actual time=0.056..0.057 rows=3 loops=2)
               Recheck Cond: ((tag_name = s.tag_name) AND (tag_time > (now() - '05:00:00'::interval)) AND (tag_time < now()) AND (tag_time > '2023-03-23 17:49:23.722315+00'::timestamp with time zone))
               Heap Blocks: exact=2
               ->  BitmapAnd  (cost=16.24..16.24 rows=3 width=0) (actual time=0.051..0.051 rows=0 loops=2)
                     ->  Bitmap Index Scan on _hyper_13_216_chunk_tag_secondary_idx  (cost=0.00..2.02 rows=84 width=0) (actual time=0.009..0.009 rows=84 loops=2)
                           Index Cond: (tag_name = s.tag_name)
                     ->  Bitmap Index Scan on _hyper_13_216_chunk_tag_value_tag_time_idx  (cost=0.00..13.74 rows=900 width=0) (actual time=0.041..0.041 rows=900 loops=2)
                           Index Cond: ((tag_time > (now() - '05:00:00'::interval)) AND (tag_time < now()) AND (tag_time > '2023-03-23 17:49:23.722315+00'::timestamp with time zone))
 Planning Time: 0.637 ms
 Execution Time: 0.216 ms
(16 rows)

tsdb=# EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time >= date_trunc('minute',now() - interval '300 minute') AND tag_time <= date_trunc('minute',now());
                                                                                QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1017.50..6050.22 rows=11006 width=34) (actual time=2.282..677.073 rows=6 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Hash Join  (cost=17.50..3949.62 rows=4586 width=34) (actual time=0.546..0.684 rows=2 loops=3)
         Hash Cond: (t.tag_name = s.tag_name)
         ->  Parallel Custom Scan (ChunkAppend) on tag_value t  (cost=11.72..232.02 rows=972531 width=23) (actual time=0.382..0.553 rows=300 loops=3)
               Chunks excluded during startup: 5
               ->  Parallel Bitmap Heap Scan on _hyper_13_216_chunk t_1  (cost=11.72..187.28 rows=529 width=23) (actual time=1.140..1.586 rows=900 loops=1)
                     Recheck Cond: ((tag_time >= date_trunc('minute'::text, (now() - '05:00:00'::interval))) AND (tag_time <= date_trunc('minute'::text, now())))
                     Heap Blocks: exact=161
                     ->  Bitmap Index Scan on _hyper_13_216_chunk_tag_value_tag_time_idx  (cost=0.00..11.50 rows=900 width=0) (actual time=1.109..1.109 rows=900 loops=1)
                           Index Cond: ((tag_time >= date_trunc('minute'::text, (now() - '05:00:00'::interval))) AND (tag_time <= date_trunc('minute'::text, now())))
         ->  Hash  (cost=5.75..5.75 rows=2 width=11) (actual time=0.042..0.043 rows=2 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.020..0.037 rows=2 loops=3)
                     Filter: (system_id = 50)
                     Rows Removed by Filter: 298
 Planning Time: 689.519 ms
 Execution Time: 677.240 ms

This might actually give me a workaround I did not realize until now!

@lkshminarayanan
Copy link
Contributor

lkshminarayanan commented Mar 24, 2023

Hi @genedavis,

I was able to reproduce the issue. The date_trunc function returns a timestamp type whereas the tag_time column is timestamptz and for some reason this is preventing the planner from choosing the optimal plan. This doesn't change even if you explicitly cast date_trunc return value to a timestamptz type. This is reproducable even if you directly pass timestamp values in the where filter. i.e.

Using TIMESTAMPTZ values in where gives you an optimal plan that user the secondary index :

EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time >= TIMESTAMPTZ '2023-03-28 00:00:00'
    AND tag_time < TIMESTAMPTZ '2023-03-29 00:00:00';
postgres=# EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time >= TIMESTAMPTZ '2023-03-28 00:00:00'
    AND tag_time < TIMESTAMPTZ '2023-03-29 00:00:00';
                                                                                           QUERY PLAN                                                                                         
  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--
 Nested Loop  (cost=62.36..207.47 rows=24 width=34) (actual time=0.886..1.895 rows=24 loops=1)
   ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.042..0.142 rows=2 loops=1)
         Filter: (system_id = 50)
         Rows Removed by Filter: 298
   ->  Bitmap Heap Scan on _hyper_1_5_chunk t  (cost=62.36..100.74 rows=12 width=23) (actual time=0.788..0.816 rows=12 loops=2)
         Recheck Cond: ((tag_name = s.tag_name) AND (tag_time >= '2023-03-28 00:00:00+05:30'::timestamp with time zone) AND (tag_time < '2023-03-29 00:00:00+05:30'::timestamp with time zone)
)
         Heap Blocks: exact=2
         ->  BitmapAnd  (cost=62.36..62.36 rows=12 width=0) (actual time=0.768..0.772 rows=0 loops=2)
               ->  Bitmap Index Scan on _hyper_1_5_chunk_tag_secondary_idx  (cost=0.00..4.92 rows=84 width=0) (actual time=0.034..0.035 rows=84 loops=2)
                     Index Cond: (tag_name = s.tag_name)
               ->  Bitmap Index Scan on _hyper_1_5_chunk_tag_value_tag_time_idx  (cost=0.00..56.29 rows=3600 width=0) (actual time=0.722..0.723 rows=3600 loops=2)
                     Index Cond: ((tag_time >= '2023-03-28 00:00:00+05:30'::timestamp with time zone) AND (tag_time < '2023-03-29 00:00:00+05:30'::timestamp with time zone))
 Planning Time: 1.517 ms
 Execution Time: 2.026 ms
(14 rows)

But if you use TIMESTAMP, the secondary index is ignored :

EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time >= TIMESTAMP '2023-03-28 00:00:00'
    AND tag_time < TIMESTAMP '2023-03-29 00:00:00';
postgres=# EXPLAIN (ANALYZE) SELECT * FROM system_tag s, tag_value t
  WHERE s.tag_name = t.tag_name AND system_id = 50
    AND tag_time >= TIMESTAMP '2023-03-28 00:00:00'
    AND tag_time < TIMESTAMP '2023-03-29 00:00:00';
                                                                               QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=6.84..323.61 rows=44 width=34) (actual time=5.915..29.333 rows=24 loops=1)
   Hash Cond: (t.tag_name = s.tag_name)
   ->  Custom Scan (ChunkAppend) on tag_value t  (cost=1.06..292.64 rows=6602 width=23) (actual time=1.011..21.566 rows=3600 loops=1)
         Chunks excluded during startup: 5
         ->  Bitmap Heap Scan on _hyper_1_5_chunk t_1  (cost=57.19..272.19 rows=3600 width=23) (actual time=1.005..9.412 rows=3600 loops=1)
               Recheck Cond: ((tag_time >= '2023-03-28 00:00:00'::timestamp without time zone) AND (tag_time < '2023-03-29 00:00:00'::timestamp without time zone))
               Heap Blocks: exact=161
               ->  Bitmap Index Scan on _hyper_1_5_chunk_tag_value_tag_time_idx  (cost=0.00..56.29 rows=3600 width=0) (actual time=0.898..0.899 rows=3600 loops=1)
                     Index Cond: ((tag_time >= '2023-03-28 00:00:00'::timestamp without time zone) AND (tag_time < '2023-03-29 00:00:00'::timestamp without time zone))
   ->  Hash  (cost=5.75..5.75 rows=2 width=11) (actual time=0.174..0.179 rows=2 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on system_tag s  (cost=0.00..5.75 rows=2 width=11) (actual time=0.043..0.158 rows=2 loops=1)
               Filter: (system_id = 50)
               Rows Removed by Filter: 298
 Planning Time: 3.700 ms
 Execution Time: 29.821 ms
(16 rows)

And this happens only if the hypertable is compressed. For uncompressed hypertables, optimal plan is chosen for both TIMESTAMP and TIMESTAMPTZ values in the where clause. And also this also occurs only with JOIN type queries. When querying just a single table, optimal plan that uses the secondary index is returned for both TIMESTAMP and TIMESTAMPTZ values in WHERE.

Thank you for reporting this issue!

@lkshminarayanan lkshminarayanan changed the title [Bug]: Tables with compression defined on them don't leverage secondary indexes [Bug]: Secondary index is ignored for joins involving compressed tables if there is a mismatch in where clause column and value types Mar 24, 2023
@genedavis
Copy link
Author

Thanks again @lkshminarayanan - that all makes perfect sense. I am likely to file another issue slightly orthogonal, but something I believe is (eventually) manageable by timescale - when querying a table that has some compressed hyperchunks and with a secondary index on the non-compressed components (setting aside the issues above) a similar issue arises when the SQL is in a function definition - just an FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants