Use space dimension as a retention setting. #563

PIdaho · 2018-06-14T15:14:21Z

I am evaluating TimescaleDB and am interested in using the space dimension as a retention policy field.

CREATE TABLE sample_data (
  ts TIMESTAMPTS PRIMARY KEY,
  ret_policy SMALLINT NOT NULL,
  device SMALLINT NOT NULL,
  sample DOUBLE NOT NULL);

/*
Where ret_policy would be:
0 = 1 month
1 = 3 months
2 = 6 months
3 = 1 year
4 = 3 years
5 = 5 years
*/

SELECT create_hypertable('sample_data', 'ts', 'ret_policy', 6, interval '7 days');

The idea being we could use a function similar to drop_chunks() to drop chunks out of dimensions based on their retention policies. Something like this.

SELECT drop_chunks(interval '1 month', 'sample_data', dimension := 0);
SELECT drop_chunks(interval '3 month', 'sample_data', dimension := 1);
...

I know from reading that you hash the partition column into the different chunks. If I read correctly you turn that hash into an integer and then divide that integer by the number of partitions for assignment.

Is there a safe way to implement this scheme today? Maybe using a partitioning function?
Is it safe to make assumptions on how you assign partitions today? Assuming that you won't change it in the future.
Any suggestions how to implement the drop_chucks() functionality with today's implementation? Can I drop chunks and clean up TimescaleDB metadata safely or is some metadata cached?

I assume if I have dropped chunks and a new value comes in from the out of range past that a new chunk would be created regardless of my scheme described above.

The text was updated successfully, but these errors were encountered:

mfreed · 2018-06-14T23:42:05Z

Hi @PIdaho would you mind sharing a bit more high-level about the desired functionality that you want, or use case you have?

The above feels like it's trying to shoehorn some existing mechanism into a use case, and I'd want to make sure there isn't a more elegant or first-class way of supporting your desired functionality.

PIdaho · 2018-06-15T15:59:26Z

We collect time-series data from many devices. These data samples are basically the same but some are at higher resolution and some are at lower resolution. We have requirements for different retention policies based on certain devices. Currently we manage these retention policies by putting data for the same policy in the same partitions. We can then drop partitions for quick deletes. It is basically one table with all of the related time-series data where each data has its own retention policy.

It just seems like adding a partition column for the retention policy would make this work as long as I know how partition columns are assigned to chunks and I had a safe way of dropping chunks by retention policy.

TimescaleDB has some really nice features. We like being able to use the COPY command on the parent table instead of having to figure out which partition to COPY into. We also like the dynamic partition creation. These would greatly simplify our partition management.

mfreed · 2018-06-16T21:21:26Z

It seems like the approach you might prefer is better support for non-hash partitioning, so that you can specify a distinct "space" partitioning by setting some column value, and then extending drop_chunks to allow one to subsequent these partitioning. Got it.

Will discuss with team how that fits into roadmap. Thanks!

andrew-blake · 2018-06-17T08:17:59Z

Mike, non-hash partitioning would also help with multi-tenanted systems where you frequently need different retention policies. We currently have to allow the system to retain the data for the customer who pays for the highest retention period, which isn't ideal.

PIdaho · 2018-06-18T14:23:10Z

@mfreed, thanks for your consideration on this issue. But what can I do today?

I am confident that I can manipulate the partition column and a partitioning function to get data into the correct partitions. The question real comes down to can I do the deletes with TS current implementation? I don't think so with drop_chunks(). I could write my own version of drop_chunks() in plpgsql but I need to know if it is safe to manipulate the chunk metadata or if that could cause some problems with TS. If you don't consider that safe I could always submit a C version to the TS project, but that would take a little longer to get spun up.

michaelmckay83 · 2018-07-24T12:45:35Z

I am also going to be needing a similar functionality. One possible solution that I have thought about doing is having a timestamp on when things should be deleted instead, that way you are back to a single column for doing partitioning. Then when you drop chunks you can drop on anything that you want to delete now() or sooner.
Can anyone see any issues with this different approach?

evan-burke · 2018-08-07T02:22:51Z

+1 to this request.

andriyfomenko · 2019-09-11T16:48:30Z

we happen to have exactly the same use-case as @andrew-blake described above: different tenants have different retention expectations, although the whole set of retention policies is quite small (under 20 entries), so dropping chunks on the combination of section/time, rather than time only would work very efficiently

working back to "set time when it has to expire" would work for this one particular use case, but would not help us much from the data read perspective, this is why we would use TimeScale DB in the first place, so this is not a solution really

ideally, "drop_chunks()" should allow for additional/optional arguments to accept the spatial component filter when applying the time-component

cocowalla · 2020-01-16T21:48:18Z

I'm also really interested in this. Consider a multi-tenant system, where each tenant has it's own retention policy (between 3 months and 10 years) - being able to instantly drop chunks belong to a single tenant would be incredibly efficient!

erikns · 2020-02-25T11:11:34Z

+1

sergyv · 2020-06-02T19:36:48Z

+1

mfreed · 2020-10-18T05:46:15Z

For some follow-up -- we currently recommend using separate hypertables if you want this functionality, e.g., being able to have different data retention policies. The secondary advantages of separate hypertables is that, especailly in multi-tenant scenarios, including:

You can have different chunk time intervals for tenants with different write rates
Easier (hyper)table-level access control policies
Different continuous aggregate or other functions for different tenants, etc.
Easy space accounting by tenant

This is actually the approach taking by Promscale, where different Prometheus metrics are stored in separate hypertables: https://github.com/timescale/promscale

cocowalla · 2020-10-18T09:19:50Z

I switched to the described solution after receiving the same via Slack (for the multi-tenant problem I described earlier in this thread).

This wouldn't feel strange for non-multi-tenant scenarios, but "table-per-tenant" does feel like an odd setup; still, it does work and does provide the states advantages.

cchengubnt · 2021-03-02T09:22:14Z

+1, any plan to this request?
my multi-tenant scenarios, i need different retention policies for multi data security levels of different area, required by the law of different states. so the table designed will be like this:

create table area_A(id, data, time, level);
create table area_B(id, data, time, level);
SELECT create_hypertable('area_A', 'time', 'level', 4, partitioning_func => 'level_value_hash');

#retention policy
SELECT drop_chunks(interval '3 month', 'area_A', level := 1);
SELECT drop_chunks(interval '6 month', 'area_A', level := 4);

By the way, the table timescaledb_information.chunks only saved primary_dimension, i can't locate any secondary dimension column with partition range, and i can't drop chunks table manually.

kgonia · 2021-03-08T08:25:30Z

I have the same need but from different use case. I have data from different devices but some of them are less interesting for me than others. Still I don't wanna delete less interesting data. I would like to move less interesting devices to other postgres tablespace.

I already have data in table also program that ingesting data don't differentiate between devices. I can move some data to other hypertable and run some cron on daily basis but I'm looking for some more 'native' solution.

rotten · 2022-07-11T21:18:31Z

If we follow the current recommendations for a multi-tenant solution where we use different tables for different tenants who are expecting different retention policies, then we might have a "30 day" table, a "60 day" table, and a "90 day" table. This could work pretty well as long as we know the retention at insertion time (which might require an additional lookup).

However, when a tenant changes their mind and switches their contract from 30 days to 90 days... We'd have to copy all of their data out of the 30 day table and into the 90 day table instead of simply updating the retention policy for that tenant. This would probably leave a lot of holes in that old table. (Is that a bad thing?) It would also be tricky to do to keep the data consistent during the copy and cut over to the new table.

The alternative is that every tenant gets their own table, which could prove to be rather unwieldy to manage when you get into the thousands of clients.

Most likely how I would implement this is to retain the data for the full length of of time for everyone, but hide it from the customer in application logic. This doesn't save any storage, but is probably the least painful.

I think ideally you could set policies based on where conditions. Without any where condition it would be "where age of insert is > some interval". Otherwise it would be "where the age is > some interval AND the user_agent_string is null AND the customer is Big Corp".

muntdan · 2022-08-12T07:49:33Z

SELECT tableoid::regclass as chunk, datetime_col__cast_as_interval, dimension_location_column FROM <hypertable> GROUP BY tableoid, datetime_col__cast_as_interval,dimension_location_column;

DROP TABLE <chunk>;

Arcturuss · 2023-06-22T11:51:45Z

Any progress on this?

The workaround with "one hypertable per client" is really just a workaround, and won't work for every case.

yarkoyarok · 2024-02-26T09:37:46Z

While one hypertable per client for our team taste felt as overwhelmed solution we decided to use direct deletion from compressed chunks by segment_by.

This gist can be helpful to understand how to find compressed chunk names by date range:

https://gist.github.com/yarkoyarok/3277a27987415b40368b53d70f348add

Then you can found rows, which consists of columns with compressed data and raw uncompressed segment_by fields.

These rows, actually, are segments.

Deleting of them occurs without decompression and being proceeded very fast (<1 sec for segment dropping vs ~1:30 minutes for same amount of data being deleted from hypertable directly in our case).

So you can not only drop chunks, but if you work with compressed data you can drop more atomic parts - segments.

More information on this technique can be found also here #5802. I've proposed there to be used by query planner.

adriangb · 2024-04-05T04:42:21Z

+1 for this.

A good first step would be making it easy to find chunks by both primary (time) and secondary dimension so that we can at least manually implement retention policies.

SeaRoll · 2024-05-02T06:13:22Z

+1 on this. The workaround seems to be very prone on SQL injection if implemented badly on sql drivers

oliora · 2024-08-31T07:27:42Z

+1

onemoreangle · 2024-11-23T13:49:04Z

+1 but I'm mostly interested in changing chunking policy by a space-dimension, i.e. changing chunk time intervals according to a secondary dimension to avoid small chunks for low resolution data

mfreed added the enhancement label Jun 16, 2018

alanhamlett mentioned this issue Sep 23, 2018

Proposal: support for exporting data before deleting it via drop_chunks #572

Closed

dianasaur323 added the community-request label Jul 16, 2019

andriyfomenko mentioned this issue Sep 11, 2019

Drop chunks of specific a space partition #545

Closed

erimatnor mentioned this issue Jan 7, 2020

Support space partitions per distinct value #544

Open

bboule added the core label Feb 19, 2020

NunoFilipeSantos added feature-request Feature proposal and removed community-request labels Sep 28, 2021

glep207 mentioned this issue Feb 4, 2022

[Enhancement]: Attach space dimension to tablespace #4055

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use space dimension as a retention setting. #563

Use space dimension as a retention setting. #563

PIdaho commented Jun 14, 2018

mfreed commented Jun 14, 2018

PIdaho commented Jun 15, 2018

mfreed commented Jun 16, 2018

andrew-blake commented Jun 17, 2018

PIdaho commented Jun 18, 2018

michaelmckay83 commented Jul 24, 2018

evan-burke commented Aug 7, 2018

andriyfomenko commented Sep 11, 2019

cocowalla commented Jan 16, 2020

erikns commented Feb 25, 2020

sergyv commented Jun 2, 2020

mfreed commented Oct 18, 2020

cocowalla commented Oct 18, 2020

cchengubnt commented Mar 2, 2021 •

edited

Loading

kgonia commented Mar 8, 2021

rotten commented Jul 11, 2022

muntdan commented Aug 12, 2022 •

edited

Loading

Arcturuss commented Jun 22, 2023

yarkoyarok commented Feb 26, 2024

adriangb commented Apr 5, 2024

SeaRoll commented May 2, 2024

oliora commented Aug 31, 2024

onemoreangle commented Nov 23, 2024

Use space dimension as a retention setting. #563

Use space dimension as a retention setting. #563

Comments

PIdaho commented Jun 14, 2018

mfreed commented Jun 14, 2018

PIdaho commented Jun 15, 2018

mfreed commented Jun 16, 2018

andrew-blake commented Jun 17, 2018

PIdaho commented Jun 18, 2018

michaelmckay83 commented Jul 24, 2018

evan-burke commented Aug 7, 2018

andriyfomenko commented Sep 11, 2019

cocowalla commented Jan 16, 2020

erikns commented Feb 25, 2020

sergyv commented Jun 2, 2020

mfreed commented Oct 18, 2020

cocowalla commented Oct 18, 2020

cchengubnt commented Mar 2, 2021 • edited Loading

kgonia commented Mar 8, 2021

rotten commented Jul 11, 2022

muntdan commented Aug 12, 2022 • edited Loading

Arcturuss commented Jun 22, 2023

yarkoyarok commented Feb 26, 2024

adriangb commented Apr 5, 2024

SeaRoll commented May 2, 2024

oliora commented Aug 31, 2024

onemoreangle commented Nov 23, 2024

cchengubnt commented Mar 2, 2021 •

edited

Loading

muntdan commented Aug 12, 2022 •

edited

Loading