Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support declarative partitioning method #1154

Open
bocaph opened this issue Apr 10, 2019 · 17 comments
Open

Support declarative partitioning method #1154

bocaph opened this issue Apr 10, 2019 · 17 comments

Comments

@bocaph
Copy link

bocaph commented Apr 10, 2019

Hi,
Thanks for working on the amazing timescaleDB project.
I am a newbie in timescaleDB, and this is just a confirmation but not an issue report.
Do you have a roadmap for switching trigger-based partitioning to declarative partitioning method?
We have some performance tests for timescaleDB vs PostgreSQL 11 Declarative Partitioning and see a little performance up in PostgreSQL 11 Declarative Partitioning (7% in bulk INSERT of 30GB of data for 10 child tables, using Dell storage).

@erimatnor
Copy link
Contributor

Hi @bocaph. TimescaleDB is not using trigger-based partitioning. It is actually using a custom highly-optimized insert plan, similar to declarative partitioning. The slight performance advantage you see for PG11 is more or less expected because with declarative partitioning you are pre-creating all subtables (partitions) before you insert data. In contrast, TimescaleDB creates its "chunk" subtables on-demand, as they are needed, giving a little bit of overhead during inserts. The upside is that you need not manually (pre-)partition your table. You'll see a performance advantage with TimescaleDB once you have a lot of subtables and 10 child tables is not enough.

Please see this blog post for further explanations on differences, including benchmarks: https://blog.timescale.com/time-series-data-postgresql-10-vs-timescaledb-816ee808bac5/

@fvannee
Copy link
Contributor

fvannee commented Apr 10, 2019

+1 for this - currently it's quite a limitation that once your chunk count grows larger than +- 500, planning time increases to >500ms for even the most simple SELECT queries, even if they use chunk exclusion. In Postgres 12 (current master branch) planning time with 500 partitions is less than 1ms (for static chunk exclusion). Furthermore, partitioned tables have run-time pruning which speeds up queries with dynamic values drastically. Both of these will not be available for inheritance-based options.
Since PG12, no ACCESS EXCLUSIVE lock is needed anymore to attach/detach partitions (with the CONCURRENTLY option), so I think it might even be possible to implement this 'dynamic chunk creation' feature of TimescaleDb now - although I have to admit I don't know the implementation details of either of these solutions well enough to be certain about this.
@erimatnor You will probably have a better overview of the potential advantages/disadvantages of adopting native partitioning. What are your views on this topic? I feel like the blogpost you link to contains mostly out-of-date information (there is now native support for hash partitioning, there are specific optimizations for partitioned tables that inheritance-based solutions don't have, INSERT performance is just as fast in PG12 for 1000 or 6000 partitions in my benchmarks, just like for TimescaleDb). What's left that's blocking?

@erimatnor
Copy link
Contributor

erimatnor commented Apr 10, 2019

@fvannee Believe me, if it was easy to adopt partitioned tables we'd do it in a hart beat. Still, due to technical differences in how we deal with partitioning, it is not going to be an easy thing, if at all possible without major re-engineering. Those are also going to be breaking changes, so we won't be able to support existing TimescaleDB databases.

I can't comment on changes in PG12 that might have removed some of the big blockers, like opening up all subrelations during INSERTs. Sure, if you insert a lot of data in a single transaction, you effectively amortize the cost of opening 6000 tables even if you don't insert into all of them. But, if you often insert batches into the "last" partition in a time-partitioned table, it probably doesn't make sense to also open the 5999 other subtables to do this insert.

The only thing I can say in conclusion is that we are constantly on the lookout for how things develop with partitioned tables, and any decision on adopting partitioned tables depends on whether the pros outweigh the cons.

@fvannee
Copy link
Contributor

fvannee commented Apr 10, 2019

Please don't get me wrong - I know this would be a change that's definitely not easy. It would both require effort to get it working in the first place, and migrating from previous versions is going to be another issue altogether. So for now it's a valid reason that it's just too much effort to implement. However, I think the reasons laid out in the blog post are (or will soon be) mostly outdated.

@bocaph
Copy link
Author

bocaph commented Apr 10, 2019

Thanks, @erimatnor, @fvannee for your comments.

I also already read the blog before creating this topic.
#sorry for miswriting in trigger-based partitioning for timescaleDB
And I agree with @fvannee about we need to consider to some valuable updates of
declarative in pg 11 (Faster partition pruning, Partition Pruning at Execution Time, additional hash method).
#maybe not switching but just additional option for declarative.
I will do some comparing tests with pg 12 also, and update here if it has any interesting results.

just for reference here is the result of my test in pg 11.2

timescaleDB 1.2.2

EXPLAIN (ANALYZE, BUFFERS) INSERT INTO conditions_timesc SELECT * FROM conditions;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Custom Scan (HypertableInsert)  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=3000785.069..3000785.069 rows=0 loops=1)
   Buffers: shared hit=1851493119 read=19929798 dirtied=16015270 written=14424221
   ->  Insert on conditions_timesc  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=3000785.068..3000785.068 rows=0 loops=1)
         Buffers: shared hit=1851493119 read=19929798 dirtied=16015270 written=14424221
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=15.050..245411.987 rows=414720016 loops=1)
               Buffers: shared hit=7092 read=3875550 dirtied=98 written=15
               ->  Seq Scan on conditions  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=0.019..112648.679 rows=414720016 loops=1)
                     Buffers: shared read=3875373
Planning Time: 0.575 ms
Execution Time: 3000785.264 ms
(10 rows)

Time: 3000812.120 ms (50:00.812)
                                        Table "public.conditions_timesc"
   Column    |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
-------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
time        | timestamp with time zone |           | not null |         | plain    |              |
location    | text                     |           | not null |         | extended |              |
temperature | double precision         |           |          |         | plain    |              |
humidity    | double precision         |           |          |         | plain    |              |
id          | integer                  |           |          |         | plain    |              |
Indexes:
    "conditions_timesc_time_idx" btree ("time" DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON conditions_timesc FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_44_chunk,
              _timescaledb_internal._hyper_1_45_chunk,
              _timescaledb_internal._hyper_1_46_chunk,
              _timescaledb_internal._hyper_1_47_chunk,
              _timescaledb_internal._hyper_1_48_chunk,
              _timescaledb_internal._hyper_1_49_chunk,
              _timescaledb_internal._hyper_1_50_chunk,
              _timescaledb_internal._hyper_1_51_chunk,
              _timescaledb_internal._hyper_1_52_chunk,
              _timescaledb_internal._hyper_1_53_chunk,
              _timescaledb_internal._hyper_1_54_chunk

PostgreSQL 11.2 declarative partitioning

EXPLAIN (ANALYZE, BUFFERS) INSERT INTO conditions_declare_par SELECT * FROM conditions;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Insert on conditions_declare_par  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=2846752.128..2846752.128 rows=0 loops=1)
   Buffers: shared hit=1728205294 read=20406837 dirtied=16529816 written=15015313
   ->  Seq Scan on conditions  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=0.013..109683.598 rows=414720016 loops=1)
         Buffers: shared hit=4 read=3875369
Planning Time: 0.419 ms
Execution Time: 2846752.375 ms
(6 rows)

Time: 2846757.200 ms (47:26.757)
                                      Table "public.conditions_declare_par"
   Column    |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
-------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
time        | timestamp with time zone |           | not null |         | plain    |              |
location    | text                     |           | not null |         | extended |              |
temperature | double precision         |           |          |         | plain    |              |
humidity    | double precision         |           |          |         | plain    |              |
id          | integer                  |           |          |         | plain    |              |
Partition key: RANGE ("time")
Indexes:
    "time_par_idx" btree ("time")
Partitions: conditions_declare_par_child01 FOR VALUES FROM ('2019-04-09 00:00:00+09') TO ('2019-05-09 00:00:00+09'),
            conditions_declare_par_child02 FOR VALUES FROM ('2019-05-09 00:00:00+09') TO ('2019-06-08 00:00:00+09'),
            conditions_declare_par_child03 FOR VALUES FROM ('2019-06-08 00:00:00+09') TO ('2019-07-08 00:00:00+09'),
            conditions_declare_par_child04 FOR VALUES FROM ('2019-07-08 00:00:00+09') TO ('2019-08-07 00:00:00+09'),
            conditions_declare_par_child05 FOR VALUES FROM ('2019-08-07 00:00:00+09') TO ('2019-09-06 00:00:00+09'),
            conditions_declare_par_child06 FOR VALUES FROM ('2019-09-06 00:00:00+09') TO ('2019-10-06 00:00:00+09'),
            conditions_declare_par_child07 FOR VALUES FROM ('2019-10-06 00:00:00+09') TO ('2019-11-05 00:00:00+09'),
            conditions_declare_par_child08 FOR VALUES FROM ('2019-11-05 00:00:00+09') TO ('2019-12-05 00:00:00+09'),
            conditions_declare_par_child09 FOR VALUES FROM ('2019-12-05 00:00:00+09') TO ('2020-01-04 00:00:00+09'),
            conditions_declare_par_child10 FOR VALUES FROM ('2020-01-04 00:00:00+09') TO ('2020-02-04 00:00:00+09')

@bocaph
Copy link
Author

bocaph commented Apr 11, 2019

Hi @erimatnor, @fvannee

Just a reference, here is the update for INSERT performance in pg 11 vs timescaleDB in my environment.
The degradation when the number of chunks is big
(here is the result I have tested with over 2 thousand of child tables) is not shown as your post in pg 11.

btw, I am not seeing any breakings in pg12 (regarding INSERT performance).

timescaleDB 1.2.2

postgres=# explain (analyze, buffers) insert into conditions_timesc select * from conditions;
                                                                    QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------
Custom Scan (HypertableInsert)  (cost=0.00..8022580.17 rows=414720717 width=42) (actual time=2787997.225..2787997.225 rows=0 loops=1)
   Buffers: shared hit=1276847924 read=20069055 dirtied=16138128 written=14669150
   ->  Insert on conditions_timesc  (cost=0.00..8022580.17 rows=414720717 width=42) (actual time=2787997.224..2787997.224 rows=0 loops=1)
         Buffers: shared hit=1276847924 read=20069055 dirtied=16138128 written=14669150
         ->  Custom Scan (ChunkDispatch)  (cost=0.00..8022580.17 rows=414720717 width=42) (actual time=12.731..496809.537 rows=414720016 loops=1)
               Buffers: shared hit=6976968 read=3881421 dirtied=3590 written=4054
               ->  Seq Scan on conditions  (cost=0.00..8022580.17 rows=414720717 width=42) (actual time=0.017..183077.727 rows=414720016 loops=1)
                     Buffers: shared hit=6 read=3875367
Planning Time: 0.529 ms
Execution Time: 2787997.840 ms
(10 rows)

Time: 2788088.903 ms (46:28.089)
postgres=# \d conditions_timesc
                    Table "public.conditions_timesc"
   Column    |           Type           | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
time        | timestamp with time zone |           | not null |
location    | text                     |           | not null |
temperature | double precision         |           |          |
humidity    | double precision         |           |          |
Indexes:
    "conditions_timesc_time_idx" btree ("time" DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON conditions_timesc FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
Number of child tables: 2401 (Use \d+ to list them.)

PostgreSQL 11.2 declarative partitioning

#time for creating tables is about 1m50.429s

$ time ./createtbl.sh > /dev/null

real    1m50.429s
user    0m4.925s
sys     0m9.136s

postgres=# explain (analyze, buffers) insert into conditions_declare_par select * from conditions;
                                                               QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------
Insert on conditions_declare_par  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=2783046.827..2783046.827 rows=0 loops=1)
   Buffers: shared hit=1386838237 read=20581596 dirtied=16656432 written=15182389
   ->  Seq Scan on conditions  (cost=0.00..8022580.17 rows=414720717 width=46) (actual time=0.024..183790.473 rows=414720016 loops=1)
         Buffers: shared hit=14 read=3875359 written=9
Planning Time: 1.328 ms
Execution Time: 2783061.255 ms
(6 rows)

Time: 2783110.947 ms (46:23.111)

postgres=# \d conditions_declare_par
                  Table "public.conditions_declare_par"
   Column    |           Type           | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
 time        | timestamp with time zone |           | not null |
 location    | text                     |           | not null |
 temperature | double precision         |           |          |
 humidity    | double precision         |           |          |
Partition key: RANGE ("time")
Indexes:
    "time_par_idx" btree ("time")
Number of partitions: 2412 (Use \d+ to list them.)

Thanks,

@erimatnor
Copy link
Contributor

erimatnor commented Apr 11, 2019

Thanks for these benchmarks @bocaph. Maybe I am misreading something in these numbers, but it looks like PG11.2 and TimescaleDB are on-par w.r.t. insert time (46:23 vs 46:28), which is to be expected (both are fundamentally PostgreSQL underneath and can't go faster than allowed by the underlying system).

However, factor in the table creation time of 1:50, and TimescaleDB is faster. Furthermore, these are ideal test conditions for declarative partitioning because you are bulk loading all data in one transaction, allowing PG11 to amortize the cost of opening all 2412 tables over the entire data set. This is why you are not seeing the performance degradation shown in our benchmarks. However, if you'd insert the data in time order and with batches of, e.g., 1000 rows, which would better represent real-world workloads, you would likely see much worse performance for PG11.2 since for every batch it would have to open all 2412 subtables (unless something has changed recently). I'd suggest you'd use a benchmarker like TSBS to test something that more accurately resembles production workloads as opposed to bulk loading.

Finally, w.r.t feature sets, I will note that TimescaleDB already supports fast partition pruning, Partition Pruning at Execution Time, and hash partitioning, and it did it long before PostgreSQL. We also do a number of other performance and planner optimizations that are not in upstream PostgreSQL. By switching to declarative partitioning, we'd risk loosing those optimizations since declarative partitioning is a much more restrictive and opinionated partitioning scheme compared to regular inheritance.

That said, if there was a clear path towards adopting declarative partitioning in TimescaleDB, including many benefits, we'd certainly do it.

We very much appreciate this feedback though, and in particular the efforts put into producing numbers. We welcome further input and will certainly weigh our options based on what the community is telling us.

@erimatnor
Copy link
Contributor

@fvannee I am not sure that blog post will be entirely outdated as soon as you think it will be. I might miss something, but, out of the issues raised in the blog post, the only changes in PostgreSQL's favor so far is the addition of hash partitioning and unique index support (both TimescaleDB features since a long time). Also note that the blog post does not make the case for not using declarative partitioning in TImescaleDB. It simply argues for the UX and performance advantages of TimescaleDB over using plain declarative partitioning for typical use cases. Thus, I'd say that that blog post would be valid even if we adopt declarative partitioning underneath the hood, since one of the main advantages of TimescaleDB is automation and various performance optimizations around time series.

Still, those points aside, there remain major blockers raised in that blog post:

  • Multidimensional partitioning in PG11 is inherently less flexible given how it is built on table hierarchies. One example is in repartitioning, which is a major feature of TimescaleDB. For instance, changing the number of hash partitions in PG11 is difficult, if not impossible (example and explanation given in blog post). Thus, it is not clear how we'd support flexible repartitioning in case we adopt declarative partitioning.
  • AFAIK, insert performance with many subtables is still an issue, and I haven't seen any indication that this will change. For the use case declarative partitioning targets, this makes sense. (Note that bulk inserts, like in @bocaph benchmarks above, will not show this issue.)

To round off, we are looking at ways in which we will be able to leverage some of the work and performance optimizations put into declarative partitioning, without outright adopting the full approach. See for instance this PR: #1134. More work coming.

@fvannee
Copy link
Contributor

fvannee commented Apr 11, 2019

@erimatnor Exactly, TimescaleDb has a lot of advantages over declarative partitioning in automation and flexibility. I'm afraid I won't be able to give any useful comments on multi-dimensional partitioning, as I haven't used this myself. I do recognize this could be an important feature for other people though.
Let me come up with some performance figures for typical use cases as well. As a starter, I've attached the INSERT performance compared between PG11+TimescaleDb and current PG master branch. (I also tested PG11 partitioning, but that one becomes much slower with large number of partitions)
Load is similar to the test cases as prescribed in the blog post:

  • Inserting 1000 rows per transaction
  • Always inserting to the same chunk
  • Tested with 16, 1024 and 8096 partitions (ts = timescale, p = pg12 partitioned)
    Note that I didn't run these for very long so the noise is quite high. They perform roughly the same I'd say. But the important thing is that the massive degradation in insert performance is no longer present.
[postgres@opamuxvm0600~] pgbench -n -T 10 -f insert_ts16.sql -p 5432
transaction type: insert_ts16.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 1971
latency average = 5.076 ms
tps = 196.999988 (including connections establishing)
tps = 197.040342 (excluding connections establishing)
[postgres@opamuxvm0600~] pgbench -n -T 10 -f insert_ts1024.sql -p 5432
transaction type: insert_ts1024.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 1906
latency average = 5.247 ms
tps = 190.580967 (including connections establishing)
tps = 190.622010 (excluding connections establishing)
[postgres@opamuxvm0600~] pgbench -n -T 10 -f insert_ts8096.sql -p 5432
transaction type: insert_ts8096.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 1987
latency average = 5.033 ms
tps = 198.677767 (including connections establishing)
tps = 198.724725 (excluding connections establishing)

[postgres@opamuxvm0600~] pgbench -n -T 10 -f insert_p8096.sql -p 4432
transaction type: insert_p8096.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 2324
latency average = 4.304 ms
tps = 232.366927 (including connections establishing)
tps = 232.419518 (excluding connections establishing)
[postgres@opamuxvm0600~] pgbench -n -T 10 -f insert_p1024.sql -p 4432
transaction type: insert_p1024.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 2339
latency average = 4.276 ms
tps = 233.875909 (including connections establishing)
tps = 233.935038 (excluding connections establishing)
[postgres@opamuxvm0600~] pgbench -n -T 10 -f insert_p16.sql -p 4432
transaction type: insert_p16.sql
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 2357
latency average = 4.243 ms
tps = 235.674131 (including connections establishing)
tps = 235.746259 (excluding connections establishing)

@fvannee
Copy link
Contributor

fvannee commented Apr 11, 2019

Edit: I added a few test cases, because the previous test cases were inconsistent

I've prepared some benchmarks for typical use cases. I've attached the SQL queries that I benchmarked, the SQL code to setup the tables for the benchmark, the raw results and some figures to provide a nice overview.
I ran tests for PG11 Partioned, PG11 TimescaleDb and current master branch of PG (called PG12 in the tests). Because TimescaleDb doesn't support non-stable builds of Postgres, I wasn't able to run TimescaleDb with PG master branch unfortunately. I did test runs for 16, 64, 256, 1024 and 4096 partitions. Every test run was done with pgbench for 60 seconds. You can find the exact commands in the raw output.
Note that most of the SELECT queries that are benchmarked are basically testing planning time, as this is generally the part that becomes much slower when more partitions are added. The queries are written such that only one or two partitions are interesting for that particular query and the rest should be discarded as early as possible. Execution time is very small for these queries as they just do a simple index scan on the remaining chunk. This is a typical use case that we see very often - to retrieve just a single or a few rows from one particular chunk.
The use cases that I benchmarked are as follows:

  1. Insert batches of 1000 rows per transaction
  2. Simple SELECT query pruning on a static timestamp
  3. Simple SELECT query pruning on a timestamp now()
  4. The same SELECT query with static timestamp but with an added 'ORDER BY a, updated_at DESC LIMIT 1', which matches the index defined on the table
  5. The same SELECT query with dynamic timestamp but with an added 'ORDER BY a, updated_at DESC LIMIT 1', which matches the index defined on the table
  6. The same simple SELECT query, but now it's wrapped inside an inlineable SQL function, called with a static timestamp
  7. The same simple SELECT query, but now it's wrapped inside a non-inlineable SQL function, called with a static timestamp
  8. The same simple SELECT query, but now it's wrapped inside a plpgsql function, called with a static timestamp
  9. The same simple SELECT query, but now it's wrapped inside an inlineable SQL function, called with a dynamic timestamp
  10. The same simple SELECT query, but now it's wrapped inside a non-inlineable SQL function, called with a dynamic timestamp
  11. The same simple SELECT query, but now it's wrapped inside a plpgsql function, called with a dynamic timestamp
  12. The same query as 2) but then in an inlineable function
  13. The same query as 3) but then in an inlineable function
  14. A SELECT with nested loop with opportunities for run-time pruning - some rows from a table are selected and the timestamp from rows in that table is used to join on another partitioned table

These are the results. Note the log-scale on both axis.

insert
Insert performance of PG12 is consistently higher (probably due to not having to check if creation of a new chunk is necessary). Even with a large number of chunks there's no degradation anymore like what we see for PG11. Both PG12 and TimescaleDb are consistent even for a large number of chunks.

select_static
TimescaleDb and PG12 are nicely constant here when increasing the number of partitions, unlike PG11 partioning. PG12 partioned is about 4x as fast though.

select_static_limit
These results are comparable with 2). Throughput slightly lower, but it doesn't differ much.

select_static_fnc
It's interesting to compare this one to figure 2. TimescaleDb seems to have a problem with inlineable functions. Even though they result in the exact same plan, planning them is much faster in PG12 than in PG11 and TimescaleDb.

select_static_fnc_ni
There's serious degradation in all cases here, because a non-inlineable SQL function always uses a generic plan, in which pruning has to occur at run-time. TPS of PG12 partitioning is about twice as high compared to TimescaleDb on PG11.

select_static_fnc_plpg
Interestingly enough, the degradation is not as bad here - plpgsql functions only use a generic plan if it deems it faster after trying it out after the fifth execution. Here, obviously it's not faster and plpgsql makes the right choice, so we see similar results as for 2).

select_now
Pruning with now() or current_date becomes interesting. Even with a simple SELECT we now see degradation in all cases. This is, because pruning has to occur later, due to the nature of these functions (they may change value in different executions of, for example, prepared statements because they're marked STABLE). PG11 Partitioning actually performs relatively well here, but none of them scale well.

select_now_limit
Adding an ORDER BY a, updated_at DESC LIMIT 1 lowers TPS even more drastically. It looks like the addition of this part has a significant impact on the planning time. I am not sure where this overhead is coming from compared to 7).

select_now_fnc
Results look similar to the bare select with now() timestamp.

select_now_fnc_ni
Not much difference between non-inlined functions and inlined functions here. It's all up to run-time pruning anyway. Heavy degradation as the number of partitions increase.

select_now_fnc_plpg
This one is interesting, because the plpgsql function actually turns the parameter now() that's passed into the function into a static value. This effectively reduces this case into the static SELECT case. Performance is thus much better.

select_static_fnc_limit
Putting the SELECT with the LIMIT-clause inside a function doesn't make a difference for PG11 (always bad) and PG12 (always good). TimescaleDb shows degradation just like in figure 4).

select_now_fnc_limit
Putting the SELECT with the LIMIT-clause inside a function doesn't make a difference for performance. Comparable to 8)

select_nested_loop
Although the graph doesn't show it, execution of this query is done drastically different between versions here. PG11 and TimescaleDb just execute the nested part 10 times for every partition. Since this is an index scan, it's relatively fast anyway as it soon detects that none of these partitions actually have interesting tuples. However, they do have to start this index scan for every partition, which is quite some overhead. PG12 has run-time pruning optimization here, which means that the index scan is only executed for those partitions that cannot be pruned by the join condition. This makes it about twice as fast.

benchmark.sql.txt
results.txt

@erimatnor
Copy link
Contributor

Thanks for these benchmarks @fvannee, some pretty awesome stuff there. We will definitely be on the lookout for improvements in PG12, and hopefully we'll be able to benefit from those as well.

@bocaph
Copy link
Author

bocaph commented Apr 16, 2019

Thanks, @fvannee for the benchmarks. Thanks @erimatnor, I am very glad to hear it.

@fvannee
Copy link
Contributor

fvannee commented Apr 17, 2019

@erimatnor Thanks for the great work on TimescaleDB. Would you be able to elaborate on case 4)? It seems strange to me that a regular query is fast, while the exact same query inlined from a function is slow. Because regular PG11 is slow in both cases, I assume that TimescaleDb optimizes this somewhere, but doesn't handle the case of an inlined function well?

@Jan-M
Copy link

Jan-M commented Jul 25, 2019

Nice work testing this and thanks for sharing the summary!

@thomasdba
Copy link

still no plan to support it ?

@erimatnor
Copy link
Contributor

erimatnor commented Feb 15, 2022

A simple POC revealed the following issues with using table partitioning.

  1. To create a multi-dimensional partitioned table, one must pre-create the subpartitions. For instance, to create a partitioned table one can do:

    CREATE TABLE measurement (time timestamptz, device int, temp float) PARTITIONED BY RANGE (time);

    But this only partitions the table on "time" and there's no information yet on how to partition on "device" as well. To also partition on "device" one must first add a subtable in the time dimension:

    CREATE TABLE mytable_t1 PARTITION OF measurement
        FOR VALUES FROM ('2006-02-01') TO ('2006-03-01')
    PARTITION BY HASH (device);

    But this already locks us into a specific time partition just to let the system know we want to also partition on "device". This also locks us into a specific range for all the child device partitions. This is a major departure from how TimescaleDB partitions where you could potentially combine different overlapping time ranges as long as they exist in different parts of the device dimension. Further, when we create a chunk on an empty table as above, we need to create two or more subtables depending on how many sub-dimensions there are.
    One way around this issue is to only use table partitioning for the first dimension and then add additional dimensions as regular table constraints. However, it is unclear if this would work well with how partitioning works internally since there is no partitioning state for the additional dimensions beyond the constraint.

  2. Partitioned subtables cannot be created on-demand. PostgreSQL doesn't allow attaching partitions to a table that is being used in the same query. So, creating a chunk on insert might not be possible. In the POC code, I ran into the following error:

    postgres=# insert into hyper values (now(), 1, 1.0);
    ERROR:  cannot CREATE TABLE .. PARTITION OF "hyper" because it is being used by active queries in this session

Now, there might be ways to get around these issues, so the investigation continues.

POC code: https://github.com/erimatnor/timescaledb/tree/hypertables-with-table-partitioning-poc

@erimatnor
Copy link
Contributor

erimatnor commented Apr 21, 2023

Here's the patch for the issue mentioned above that prevents creating new partitions during inserts: postgres/postgres@dc3e436b191

This code in PostgreSQL is currently a major blocker for adopting declarative partitioning in TimescaleDB. But there are other reasons too why it might not be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants