Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Segmentation fault when INNER JOINing two hypertables #4418

Closed
boxhock opened this issue Jun 5, 2022 · 10 comments · Fixed by #4760
Closed

[Bug]: Segmentation fault when INNER JOINing two hypertables #4418

boxhock opened this issue Jun 5, 2022 · 10 comments · Fixed by #4760
Assignees
Labels
bug segfault Segmentation fault

Comments

@boxhock
Copy link

boxhock commented Jun 5, 2022

What type of bug is this?

Crash, Unexpected error

What subsystems and features are affected?

Query executor

What happened?

I am trying to run a query on a hypertable, while also INNER JOINing it with another hypertable. However, this causes a segmentation fault. The expected behavior is for the query to complete without a segmentation fault.

Changing the query to use a LEFT JOIN works as expected.

TimescaleDB version affected

2.7.0

PostgreSQL version used

14.3

What operating system did you use?

Ubuntu 20.04.4 x86-64

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

2022-06-05 15:34:25.631 UTC [21278] LOG:  server process (PID 146466) was terminated by signal 11: Segmentation fault
2022-06-05 15:34:25.631 UTC [21278] DETAIL:  Failed process was running: SELECT time_bucket_gapfill('1d', tbl1.timestamp) as time,
	       count(*) as count
	FROM table1 tbl1
	         INNER JOIN table2 tbl2
	                    ON tbl1.tbl2_hash = tbl2.hash
	                        AND tbl1.timestamp = tbl2.timestamp
	WHERE tbl1.timestamp BETWEEN '2022-01-25T21:42:21.182Z' AND '2022-06-01T21:42:21.182Z'
	GROUP BY "time"
	ORDER BY "time"
2022-06-05 15:34:25.631 UTC [21278] LOG:  terminating any other active server processes
2022-06-05 15:34:25.831 UTC [21278] LOG:  all server processes terminated; reinitializing
2022-06-05 15:34:26.209 UTC [146469] LOG:  database system was interrupted; last known up at 2022-06-05 15:34:11 UTC
2022-06-05 15:34:26.209 UTC [146470] [unknown]@[unknown] LOG:  PID 146259 in cancel request did not match any process
2022-06-05 15:34:26.209 UTC [146471] [unknown]@[unknown] LOG:  PID 146261 in cancel request did not match any process
2022-06-05 15:34:26.259 UTC [146469] LOG:  database system was not properly shut down; automatic recovery in progress
2022-06-05 15:34:26.261 UTC [146469] LOG:  redo starts at 9285/BF1C36D8
2022-06-05 15:34:26.833 UTC [146472] [unknown]@[unknown] LOG:  PID 146260 in cancel request did not match any process
2022-06-05 15:34:26.833 UTC [146473] [unknown]@[unknown] LOG:  PID 146262 in cancel request did not match any process
2022-06-05 15:34:26.834 UTC [146474] [unknown]@[unknown] LOG:  PID 146263 in cancel request did not match any process
2022-06-05 15:34:27.483 UTC [146469] LOG:  redo done at 9285/F3CA1028 system usage: CPU: user: 0.49 s, system: 0.72 s, elapsed: 1.22 s

How can we reproduce the bug?

It is not entirely clear to me the exact steps to reproduce, but I've reproduced this on multiple instances with somewhat similar steps:

1. Create two hypertables
2. Populate each hypertable with 1,000,000+ rows
  a) Each row in table1 has a reference to a row in table2
  b) Table2 contains more rows than table1
3. Run query to INNER JOIN table1 and table2
@boxhock boxhock added the bug label Jun 5, 2022
@nikkhils
Copy link
Contributor

nikkhils commented Jun 6, 2022

@boxhock can you please share the stack trace of the crashing backend? You can use the below to get a stack trace on ubuntu:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#On_Ubuntu

@fabriziomello
Copy link
Contributor

Hey @boxhock ... did u had a change to get a stack trace as Nikhil asked before?

@horzsolt
Copy link

hey @boxhock any chance to provide a stack trace for us? Also, can you please share with us the exact schemes of the tables you used for reproducing the issue and maybe some sample data? These would help us to be able to constantly reproduce the issue and find the root cause. thank you.

@github-actions
Copy link

This issue has been automatically marked as stale due to lack of activity. You can remove the stale label or comment. Otherwise, this issue will be closed in 30 days. Thank you!

@cocowalla
Copy link

I have the same problem on PG 14.5 with TSDB 2.8.0, getting a segfault when using time_bucket_gapfill with an INNER JOIN between two hypertables. Considering going back to v1.7.5 😞

I'm running Postgres in Docker, and I've no clue how to get a stack trace out of it - if someone can advise, I'd be happy to try?

@jnidzwetzki
Copy link
Contributor

Hello @cocowalla,

Thank you very much for reaching out. Creating stack traces in a Docker container can be laborious. Do you have a self-contained example that reproduces the crash that you can share? I would then try to reproduce the crash on my system and generate a stack trace.

@cocowalla
Copy link

@jnidzwetzki I was literally just putting something together, and I saw a notification that it's fixed!

What's the easiest way for me to test it before it hits v2.8.1 please?

@svenklemm
Copy link
Member

Easiest way is the nightly docker image: timescaledev/timescaledb:nightly-pg14. You might have to wait a day till tomorrow till it picks up the bugfix.

@cocowalla
Copy link

Easiest way is the nightly docker image: timescaledev/timescaledb:nightly-pg14. You might have to wait a day till tomorrow till it picks up the bugfix.

The latest timescaledev build is 20 days old (seems odd for a Nightly tag 😕!), so I built it myself instead. Now, the segfault is fixed, which is great!

But, this issue only seems to be partially resolved - if I do an INNER JOIN between tables, or I do a LEFT JOIN and include the joined table in the WHERE clause, I get:

ORDER/GROUP BY expression not found in targetlist

I'll prepare a repro and create a new issue.

@cocowalla
Copy link

@jnidzwetzki I just opened issue #4774 in relation to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug segfault Segmentation fault
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants