Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement/Documentation]: Improve warnings when enabling column store #7663

Open
jflambert opened this issue Feb 6, 2025 · 3 comments
Open
Labels
enhancement An enhancement to an existing feature for functionality

Comments

@jflambert
Copy link

jflambert commented Feb 6, 2025

What type of enhancement is this?

API improvement, User experience

What subsystems and features will be improved?

Compression

What does the enhancement do?

Using pg16.6-ts2.18.0

This is both a documentation and API improvement request. I am basically struggling to understand various warnings that come up when enabling compression with and without values for segmentby and orderby.

Consider the following hypertable which stores unique values per device/channel/signal/timestamp. The PK is somewhat important in order to reject duplicates due to poor QoS from edge devices.

DROP TABLE IF EXISTS test_signals;
CREATE TABLE test_signals
(
    device_id uuid NOT NULL,
    channel text NOT NULL,
    signal text NOT NULL,
    signal_timestamp TIMESTAMPTZ NOT NULL,
    value float NOT NULL,
    PRIMARY KEY (device_id, channel, signal, signal_timestamp)
);
SELECT create_hypertable('test_signals', by_range('signal_timestamp'));
ALTER TABLE test_signals SET (timescaledb.enable_columnstore);

Enabling compression without arguments yields a warning and two notices:

WARNING:  there was some uncertainty picking the default segment by for the hypertable: Please make sure device_id is not a unique column and appropriate for a segment by
NOTICE:  default segment by for hypertable "test_signals" is set to "device_id"
NOTICE:  default order by for hypertable "test_signals" is set to "channel, signal, signal_timestamp DESC"

I find this strange because the documentation mentions that both are optional, and the default value for segmentby is actually null?

Image

Fine, I'll provide device_id as a segmentby (makes sense), but why would I need channel and signal for orderby? I needed them for the PK, but I won't actually sort with them. I just want signal_timestamp really.

ALTER TABLE test_signals SET (
  timescaledb.enable_columnstore,
  timescaledb.segmentby='device_id',
  timescaledb.orderby='signal_timestamp'
);
WARNING:  column "channel" should be used for segmenting or ordering
WARNING:  column "signal" should be used for segmenting or ordering

This confuses me even more.

  • Why suggest channel/signal for segmenting? Should I actually provide all three?
  • Why was signal_timestamp by itself accepted without WARNING, yet the automatic value imposed channel and signal?

I think a bit more documentation would be appreciated. The "problem" here seems to be due to using a composite primary key. Would it be better practice to not use one, or a UNIQUE constraint instead?

Thanks in advance!

Implementation challenges

No response

@jflambert jflambert added the enhancement An enhancement to an existing feature for functionality label Feb 6, 2025
@jflambert
Copy link
Author

Hello @mkindahl sorry to pull you over but do you have some thoughts on these warnings, given my table structure? This is essentially a continuation of our index discussion.

@mkindahl
Copy link
Contributor

Hello @mkindahl sorry to pull you over but do you have some thoughts on these warnings, given my table structure? This is essentially a continuation of our index discussion.

There is a bunch of checks when creating the hypertable and they are quite basic, which is the reason you get these. Essentially, if there is a primary key or unique index, suggest to use all columns mentioned there as orderby columns, after removing any columns that are used as segmentby columns.

There is a similar set of very basic rules for segmentby, which causes the segmentby columns to be proposed.

Have a look at https://github.com/timescale/timescaledb/pull/6386/files, where these routines were added.

@jflambert
Copy link
Author

jflambert commented Feb 12, 2025

Thanks for the links to the internal documentation! I am now confident that picking device_id as segmentby is the correct choice in my use case.

However, the orderby is still a mystery. I now understand how the warning comes up (channel/signal "missing") but I don't understand why it's a "problem".

I see two outstanding issues:

  • the "official" (user-facing) ALTER TABLE (Compression) documentation is stale.
  • better explanation (vulgarization) of the orderby argument's impact on compressed data and (presumably) degraded query performance if incorrect values are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to an existing feature for functionality
Projects
None yet
Development

No branches or pull requests

2 participants