[Bug] Query optimization: time-series table with +10M rows #6129

Pierre-Alexandre35 · 2023-09-28T14:55:36Z

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query executor

What happened?

Hello,

I have a table conditions with 10.5 millions rows. Those lines are coming from an IoT device that create data every 3 seconds (so 1 year is around 10.5M lines).

I am running a simple query toSELECT *and the performance seems decent (around 1 second). However, when it comes to only select a definite sample (1 row per day --> 365 rows), it's taking more than 30 seconds. Not quite sure if this should be the normal behaviour in term of performance. This is the script used to create my schema:

CREATE TABLE
  public.conditions(
    "time" timestamp with time zone NOT NULL,
    i1 real NOT NULL,
    i2 real NOT NULL,
    i3 real NOT NULL,
    i4 real NOT NULL,
    i5 real NOT NULL,
    i6 real NOT NULL,
    i7 real NOT NULL,
    i8 real NOT NULL,
    i10 real NOT NULL,
    i11 real NOT NULL
  );
  
  SELECT create_hypertable(
  'public.conditions',
  'time',
  chunk_time_interval => INTERVAL '1 day'
);

I have some performance issues to retrieve my data for more "complex" queries:

simple SELECT:

query:

EXPLAIN ANALYZE
SELECT
  *
FROM
  conditions;

result:

-----------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..256229.97 rows=10511998 width=48) (actual time=0.038..940.653 rows=10511998 loops=1)
   ->  Seq Scan on _hyper_1_1_chunk  (cost=0.00..557.98 rows=28798 width=48) (actual time=0.035..3.801 rows=28798 loops=1)
   ->  Seq Scan on _hyper_1_2_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.018..2.568 rows=28800 loops=1)
   ->  Seq Scan on _hyper_1_3_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.005..2.791 rows=28800 loops=1)
   ->  Seq Scan on _hyper_1_4_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.007..2.472 rows=28800 loops=1)
   ->  Seq Scan on _hyper_1_5_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.007..2.629 rows=28800 loops=1)
   ->  Seq Scan on _hyper_1_6_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.006..2.249 rows=28800 loops=1)
   ->  Seq Scan on _hyper_1_7_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.005..1.950 rows=28800 loops=1)
   ->  Seq Scan on _hyper_1_8_chunk  (cost=0.00..558.00 rows=28800 width=48) (actual time=0.006..1.873 rows=28800 loops=1)
tual time=0.028..1.828 rows=28800 loops=1)
ctual time=0.005..1.673 rows=28800 loops=1)

More complex SELECT:

query (only display 1 row per day - since

EXPLAIN ANALYZE
SELECT
  DISTINCT ON (DATE_TRUNC('day', time)) *
FROM
  conditions
WHERE
  DATE_TRUNC('day', time) BETWEEN '2022-01-01'
  AND '2022-12-31'
ORDER BY
  DATE_TRUNC('day', time),
  time;

result:

...
                           Filter: ((date_trunc('day'::text, "time") >= '2022-01-01 00:00:00+00'::timestamp with time zone) AND (date_trunc('day'::text, "time") <= '2022-12-31 00:00:00+00'::timestamp with time zone))
                     ->  Parallel Seq Scan on _hyper_1_364_chunk  (cost=0.00..608.82 rows=85 width=48) (actual time=0.024..5.996 rows=28800 loops=1)
                           Filter: ((date_trunc('day'::text, "time") >= '2022-01-01 00:00:00+00'::timestamp with time zone) AND (date_trunc('day'::text, "time") <= '2022-12-31 00:00:00+00'::timestamp with time zone))
                     ->  Parallel Seq Scan on _hyper_1_365_chunk  (cost=0.00..608.82 rows=85 width=48) (actual time=0.013..3.150 rows=14400 loops=2)
                           Filter: ((date_trunc('day'::text, "time") >= '2022-01-01 00:00:00+00'::timestamp with time zone) AND (date_trunc('day'::text, "time") <= '2022-12-31 00:00:00+00'::timestamp with time zone))
                     ->  Parallel Seq Scan on _hyper_1_1_chunk  (cost=0.00..608.80 rows=85 width=48) (actual time=0.020..1.652 rows=7200 loops=4)
                           Filter: ((date_trunc('day'::text, "time") >= '2022-01-01 00:00:00+00'::timestamp with time zone) AND (date_trunc('day'::text, "time") <= '2022-12-31 00:00:00+00'::timestamp with time zone))
 Planning Time: 40.823 ms
 JIT:
   Functions: 2929
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 167.781 ms, Inlining 0.000 ms, Optimization 31.081 ms, Emission 735.023 ms, Total 933.885 ms
 Execution Time: 33151.422 ms
(745 rows)

I have also tried this with a little improvement but still around 25 seconds:

SELECT DISTINCT ON (time_bucket('1 day', "time")) 
  time_bucket('1 day', "time") as day,
  i1,
  i2,
  i3,
  i4,
  i5,
  i6,
  i7,
  i8,
  i10,
  i11
FROM
  public.conditions
ORDER BY
  time_bucket('1 day', "time"), "time";

TimescaleDB version affected

2.5

PostgreSQL version used

14

What operating system did you use?

Apple M2 Pro

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

The text was updated successfully, but these errors were encountered:

antekresic · 2023-10-02T07:33:47Z

Hi,

Looks like you might be hitting the JIT performance regression issue here which is a more general PG problem with partitioned tables. We have an outstanding issue about investigating if we can do something about this problem.

In the meantime, disabling JIT for these types of queries should help here. You can do that by disabling it for the role you are using to connect like this:

alter role mytestrole set jit=off;

Note: you will have to reconnect for this to take affect.

Let me know if this helps with the performance.

alex-tate · 2023-10-03T11:05:46Z

I haven't got anything to add about the JIT comment, but the slow-runing query presented could be tackled with an AsOf Join. This is currently the most voted TimescaleDB feature request (see #271) but for now you could use a generated timeseries and a lateral join which would speed up the specfic query example given. Something similar to...

SELECT cond.*
from (select * from generate_series('2022-01-01 00:00:00'::timestamptz, '2022-12-31 00:00:00'::timestamptz, '1 day'::interval)) as gs(x)
cross join lateral 
(select * from conditions
    where conditions.time >= gs.x
    order by conditions.time asc limit 1) as cond;

I tested on a one second sensor table (30M rows per year) that I have and got a 70s run time on the originl query versus 100ms run time for the lateral join method.

github-actions · 2024-10-21T02:01:28Z

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days.

Thank you!

github-actions · 2024-11-20T02:02:34Z

Dear Author,

We are closing this issue due to lack of activity. Feel free to add a comment to this issue if you can provide more information and we will re-open it.

Thank you!

Pierre-Alexandre35 added the bug label Sep 28, 2023

horzsolt added the waiting-for-author label Dec 18, 2023

github-actions bot added the no-activity label Oct 21, 2024

github-actions bot closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Query optimization: time-series table with +10M rows #6129

[Bug] Query optimization: time-series table with +10M rows #6129

Pierre-Alexandre35 commented Sep 28, 2023 •

edited

Loading

antekresic commented Oct 2, 2023

alex-tate commented Oct 3, 2023

github-actions bot commented Oct 21, 2024

github-actions bot commented Nov 20, 2024

[Bug] Query optimization: time-series table with +10M rows #6129

[Bug] Query optimization: time-series table with +10M rows #6129

Comments

Pierre-Alexandre35 commented Sep 28, 2023 • edited Loading

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

antekresic commented Oct 2, 2023

alex-tate commented Oct 3, 2023

github-actions bot commented Oct 21, 2024

github-actions bot commented Nov 20, 2024

Pierre-Alexandre35 commented Sep 28, 2023 •

edited

Loading