Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: recompress_chunk doesn't update compression_stats #6221

Open
Tindarid opened this issue Oct 20, 2023 · 3 comments
Open

[Bug]: recompress_chunk doesn't update compression_stats #6221

Tindarid opened this issue Oct 20, 2023 · 3 comments
Labels

Comments

@Tindarid
Copy link

Tindarid commented Oct 20, 2023

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Compression

What happened?

Hi!

After calling recompress_chunk on a chunk which is partially compressed, compression statistics is not updated (hypertable has segment_by column list).

TimescaleDB version affected

2.12.1

PostgreSQL version used

14.7

What operating system did you use?

What installation method did you use?

Other

What platform did you run on?

On prem/Self-hosted, Other

Relevant log output and stack trace

No response

How can we reproduce the bug?

1. Create a hypertable with segment_by setting
2. Insert some data to create 1 chunk
3. Compress the data
4. Fetch compression stats
5. Insert more data (to the same chunk)
6. Recompress the data
7. Fetch compression stats -> they are equal to 4.
8. Run vacuum full (noticed that space reclaimed only after this operation; could be on the chunk itself)
9. fetch compression stats -> they are equal to 4.
@Tindarid Tindarid added the bug label Oct 20, 2023
@mkindahl mkindahl self-assigned this Oct 23, 2023
@mkindahl
Copy link
Contributor

@Tindarid Thank you for the bug report. This was trivial to reproduce:

import psycopg2
from psycopg2.extras import RealDictCursor

def insert_data(cursor, name):
    cursor.execute(f"INSERT INTO {name} SELECT time, (random()*30)::int, random()*80 - 40 FROM generate_series(NOW() - INTERVAL '14 days', NOW(), '1 minute') AS time")

def get_compression_stats(conn, name):
    with conn.cursor() as cursor:
        cursor.execute("SELECT * FROM hypertable_compression_stats(%s)", (name,))
        return dict(cursor.fetchone())

def enable_compression(cursor, name, segmentby=None):
    options = ['timescaledb.compress']
    if segmentby:
        fields = ",".join(segmentby)
        options.append(f"timescaledb.compress_segmentby = '{fields}'")
        cursor.execute(f"ALTER TABLE {name} SET ({','.join(options)})")

def setup(conn, name):
    with conn.cursor() as cursor:
        cursor.execute(f"DROP TABLE {name}")
        cursor.execute(
            f"CREATE TABLE {name}(time TIMESTAMPTZ NOT NULL, device INTEGER, temperature FLOAT)"
        )
        cursor.execute(r"SELECT * FROM create_hypertable(%s, 'time', 'device', 4)", (name,))
        insert_data(cursor, name)
        enable_compression(cursor, name, segmentby=['device'])
        cursor.execute("SELECT compress_chunk(show_chunks(%s))", (name,))

def recompress_chunks(conn, name):
    with conn.cursor() as cursor:
        cursor.execute("SELECT * FROM show_chunks(%s)", (name,))
        chunks = [row['show_chunks'] for row in cursor]
        for chunk in chunks:
            cursor.execute("CALL recompress_chunk(%s, if_not_compressed => true)",
                           (chunk,))

def main():
    """Entrypoint for script."""
    conn = psycopg2.connect(dbname='mats', user='mats', host='/tmp')
    conn.autocommit = True
    conn.cursor_factory = RealDictCursor
    name = 'conditions'
    setup(conn, name)
    with conn.cursor() as cursor:
        cursor.execute(f"SELECT count(*) FROM {name}")
        count_before = cursor.fetchone()['count']
    stats_before = get_compression_stats(conn, name)
    with conn.cursor() as cursor:
        insert_data(cursor, name)
    recompress_chunks(conn, name)
    with conn.cursor() as cursor:
        cursor.execute(f"SELECT count(*) FROM {name}")
        count_after = cursor.fetchone()['count']
    stats_after = get_compression_stats(conn, name)

    print("count", "::", count_before, "->", count_after)
    for key in stats_before.keys():
        print(key, "::", stats_before[key], "->", stats_after[key])
        
if __name__ == '__main__':
    main()

Producing the output:

count :: 20161 -> 40322
total_chunks :: 12 -> 12
number_compressed_chunks :: 12 -> 12
before_compression_table_bytes :: 1392640 -> 1392640
before_compression_index_bytes :: 2146304 -> 2146304
before_compression_toast_bytes :: 0 -> 0
before_compression_total_bytes :: 3538944 -> 3538944
after_compression_table_bytes :: 229376 -> 237568
after_compression_index_bytes :: 196608 -> 196608
after_compression_toast_bytes :: 704512 -> 1433600
after_compression_total_bytes :: 1130496 -> 1867776
node_name :: None -> None

@mkindahl
Copy link
Contributor

A similar issue, but not a duplicate, is #5881

@mkindahl mkindahl removed their assignment Nov 1, 2023
@jflambert
Copy link

I have a similar, but opposite issue: #7713

In my case, stats don't update unless I recompress the chunks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants