Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize MPP FDW LIMIT/OFFSET push down when there is NULL/0. (#17246) #896

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

avamingli
Copy link
Contributor

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


avamingli and others added 3 commits January 27, 2025 09:31
When there are NULL or zero values in OFFSET/LIMIT clause,
we do NOT need to fetch more rows and expression NULL plus others is pointless.

Optimize (N > 0)
  LIMIT 0 OFFSET N to LIMIT 0
  LIMIT 0 OFFSET NULL to LIMIT 0
  LIMIT N OFFSET NULL to LIMIT N
  LIMIT N OFFSET 0 to LIMIT N

Example before this fix:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1, c2 FROM mpp_ft2 order by c1 limit 0 offset 998;
QUERY PLAN
--------------
 Limit
   Output: c1, c2
   ->  Gather Motion 2:1  (slice1; segments: 2)
         Output: c1, c2
         Merge Key: c1
         ->  Foreign Scan on public.mpp_ft2
               Output: c1, c2
               Remote SQL: SELECT c1, c2 FROM "MPP_S 1"."T 2" ORDER BY
c1 ASC NULLS LAST LIMIT (998::bigint + 0::bigint)

We will have to fetch 998 rows from remote, but as we have limit 0,
that's pointless.

With this fix:
EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1, c2 FROM mpp_ft2 order by c1 limit 0 offset 998;
QUERY PLAN
--------------
 Limit
   Output: c1, c2
   ->  Gather Motion 2:1  (slice1; segments: 2)
         Output: c1, c2
         Merge Key: c1
         ->  Foreign Scan on public.mpp_ft2
               Output: c1, c2
               Remote SQL: SELECT c1, c2 FROM "MPP_S 1"."T 2" ORDER BY
c1 ASC NULLS LAST LIMIT 0::bigint
Currently we lazily flush the forget commit record of a dtx and sync it to a
mirror. This has been fine since dtx recovery can take care of losing the
record and regenerate it. Also, allowing syncrep could have negative performance
impact so it shouldbe limited.

However, in the case of enabling hot standby with synchronous_mode=remote_apply,
it is absolutely required that the forget commit is applied on the standby as
soon as the query is completed on primary. Because otherwise, the standby might
not see the forget commit which makes it think the dtx still in-progress. So it
would view the changes made by the dtx invisible. This would be an incorrect
behavior of remote_apply.

So now request it before supporting hot standby in the subsequent commits.
Currently gpssh always clears out the TERM env variable before performing SSH connections. This will cause problems when users want only to use a specific terminal (like tmux) when performing SSH connections since the empty TERM will cause some terminals to fail.

To fix this issue, a retry operation is added when trying to login to a host. The first login attempt is made in the current manner where it clears out the `TERM` env variable. If it fails, then the operation is retried by restoring the `TERM` env variable.

`gpssh` will now also print the exception in case of errors so that we get to know the actual reason for SSH failures aiding us in debugging.

**Output During Failure**
Before:
```
[gpadmin@cdw ~]$ gpssh -h cdw
[ERROR] unable to login to cdw
Could not acquire connection.
```

After:
```
[gpadmin@cdw ~]$ gpssh -h cdw
[ERROR] unable to login to cdw
Could not acquire connection.
End Of File (EOF). Exception style platform.
<gpssh_modules.gppxssh_wrapper.PxsshWrapper object at 0x7ffa144b7860>
version: 3.3
command: /usr/bin/ssh
args: ['/usr/bin/ssh', '-o', 'StrictHostKeyChecking=no', '-o', 'BatchMode=yes', '-q', '-l', 'gpadmin', 'cdw']
searcher: <pexpect.searcher_re object at 0x7ffa144b7c50>
buffer (last 100 chars): b''
before (last 100 chars): b'  9 07:34:06 2024 from 10.0.34.166\r\r\nopen terminal failed: missing or unsuitable terminal: unknown\r\n'
after: <class 'pexpect.EOF'>
match: None
match_index: None
exitstatus: None
flag_eof: True
pid: 52832
child_fd: 3
closed: False
timeout: 30
delimiter: <class 'pexpect.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
```
@avamingli avamingli added the cherry-pick cherry-pick upstream commts label Jan 27, 2025
@avamingli avamingli merged commit e3fc633 into apache:main Jan 27, 2025
23 checks passed
@avamingli avamingli deleted the cp27 branch January 27, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick cherry-pick upstream commts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants