Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690

Open
wants to merge 6 commits into
base: unstable
Choose a base branch
from

Conversation

enjoy-binbin
Copy link
Member

Allows cluster admins to configure the cluster manual failover timeout as
needed, admins can configure how long a primary would be paused in the
worst case scenario such as a failover timed out due to the insufficient
votes.

The configuration name is cluster-manual-failover-timeout, the unit is
milliseconds, and the default value is 5000.

…al failover

Allows cluster admins to configure the cluster manual failover timeout as
needed, admins can configure how long a primary would be paused in the
worst case scenario such as a failover timed out due to the insufficient
votes.

The configuration name is cluster-manual-failover-timeout, the unit is
milliseconds, and the default value is 5000.

Signed-off-by: Binbin <[email protected]>
@enjoy-binbin enjoy-binbin requested a review from a team February 8, 2025 03:51
valkey.conf Outdated
Comment on lines 1848 to 1866
# A manual failover is a special kind of failover that is usually executed when
# there are no actual failures, but we wish to swap the current primary with one
# of its replicas (which is the node we send the CLUSTER FAILOVER command to) in
# a safe way, without any window for data loss.
#
# It works in the following way:
# 1. The replica tells the primary to stop processing queries from clients.
# 2. The primary replies to the replica with the current replication offset.
# 3. The replica waits for the replication offset to match on its side, to make
# sure it processed all the data from the primary before it continues.
# 4. The replica starts a failover, obtains a new configuration epoch from the
# majority of the masters, and broadcasts the new configuration.
# 5. The old primary receives the configuration update: unpause its clients and
# starts replying with redirection messages so that they'll continue the chat
# with the new primary.
#
# This way clients are moved away from the old primary to the new primary atomically
# and only when the replica that is turning into the new primary has processed
# all of the replication stream from the old primary.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i took this from CLUSTER FAILOVER command docs, let me know if it is too much, please also feel free to insert a suggestion if you have a good wording.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also mention what's the behavior if it's set to 0, is it indefinite timeout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should set a min and max value of it (that is not allow 0), but i am not sure about the value, do you have some ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't think a value of 0 makes sense since there is no way to abort (afaik, you might be able to trigger another failover). I would vote a minimum time of like 1000ms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also want a max value? The big value also does not makes senese and we will hit the overflow issue when doing this now + (server.cluster_mf_timeout * CLUSTER_MF_PAUSE_MULT) if someone test it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but agree a high value is bad since it's not interruptible. Maybe like a minute? (Seems like such a small range, maybe minimum should be lower, like 250ms)

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.87%. Comparing base (e9ed53c) to head (74d1642).
Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1690      +/-   ##
============================================
- Coverage     71.08%   70.87%   -0.22%     
============================================
  Files           121      121              
  Lines         65254    65255       +1     
============================================
- Hits          46385    46248     -137     
- Misses        18869    19007     +138     
Files with missing lines Coverage Δ
src/cluster_legacy.c 85.90% <100.00%> (-0.36%) ⬇️
src/config.c 78.39% <ø> (ø)
src/server.h 100.00% <ø> (ø)

... and 13 files with indirect coverage changes

Signed-off-by: Binbin <[email protected]>
src/server.h Outdated Show resolved Hide resolved
valkey.conf Outdated
Comment on lines 1848 to 1866
# A manual failover is a special kind of failover that is usually executed when
# there are no actual failures, but we wish to swap the current primary with one
# of its replicas (which is the node we send the CLUSTER FAILOVER command to) in
# a safe way, without any window for data loss.
#
# It works in the following way:
# 1. The replica tells the primary to stop processing queries from clients.
# 2. The primary replies to the replica with the current replication offset.
# 3. The replica waits for the replication offset to match on its side, to make
# sure it processed all the data from the primary before it continues.
# 4. The replica starts a failover, obtains a new configuration epoch from the
# majority of the masters, and broadcasts the new configuration.
# 5. The old primary receives the configuration update: unpause its clients and
# starts replying with redirection messages so that they'll continue the chat
# with the new primary.
#
# This way clients are moved away from the old primary to the new primary atomically
# and only when the replica that is turning into the new primary has processed
# all of the replication stream from the old primary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also mention what's the behavior if it's set to 0, is it indefinite timeout?

valkey.conf Outdated Show resolved Hide resolved
@enjoy-binbin enjoy-binbin added release-notes This issue should get a line item in the release notes major-decision-pending Major decision pending by TSC team labels Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team release-notes This issue should get a line item in the release notes
Projects
Status: After RC1
Development

Successfully merging this pull request may close these issues.

3 participants