-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690
base: unstable
Are you sure you want to change the base?
Conversation
…al failover Allows cluster admins to configure the cluster manual failover timeout as needed, admins can configure how long a primary would be paused in the worst case scenario such as a failover timed out due to the insufficient votes. The configuration name is cluster-manual-failover-timeout, the unit is milliseconds, and the default value is 5000. Signed-off-by: Binbin <[email protected]>
valkey.conf
Outdated
# A manual failover is a special kind of failover that is usually executed when | ||
# there are no actual failures, but we wish to swap the current primary with one | ||
# of its replicas (which is the node we send the CLUSTER FAILOVER command to) in | ||
# a safe way, without any window for data loss. | ||
# | ||
# It works in the following way: | ||
# 1. The replica tells the primary to stop processing queries from clients. | ||
# 2. The primary replies to the replica with the current replication offset. | ||
# 3. The replica waits for the replication offset to match on its side, to make | ||
# sure it processed all the data from the primary before it continues. | ||
# 4. The replica starts a failover, obtains a new configuration epoch from the | ||
# majority of the masters, and broadcasts the new configuration. | ||
# 5. The old primary receives the configuration update: unpause its clients and | ||
# starts replying with redirection messages so that they'll continue the chat | ||
# with the new primary. | ||
# | ||
# This way clients are moved away from the old primary to the new primary atomically | ||
# and only when the replica that is turning into the new primary has processed | ||
# all of the replication stream from the old primary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i took this from CLUSTER FAILOVER command docs, let me know if it is too much, please also feel free to insert a suggestion if you have a good wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also mention what's the behavior if it's set to 0, is it indefinite timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should set a min and max value of it (that is not allow 0), but i am not sure about the value, do you have some ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't think a value of 0 makes sense since there is no way to abort (afaik, you might be able to trigger another failover). I would vote a minimum time of like 1000ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also want a max value? The big value also does not makes senese and we will hit the overflow issue when doing this now + (server.cluster_mf_timeout * CLUSTER_MF_PAUSE_MULT)
if someone test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion, but agree a high value is bad since it's not interruptible. Maybe like a minute? (Seems like such a small range, maybe minimum should be lower, like 250ms)
Signed-off-by: Binbin <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1690 +/- ##
============================================
- Coverage 71.08% 70.87% -0.22%
============================================
Files 121 121
Lines 65254 65255 +1
============================================
- Hits 46385 46248 -137
- Misses 18869 19007 +138
|
Signed-off-by: Binbin <[email protected]>
valkey.conf
Outdated
# A manual failover is a special kind of failover that is usually executed when | ||
# there are no actual failures, but we wish to swap the current primary with one | ||
# of its replicas (which is the node we send the CLUSTER FAILOVER command to) in | ||
# a safe way, without any window for data loss. | ||
# | ||
# It works in the following way: | ||
# 1. The replica tells the primary to stop processing queries from clients. | ||
# 2. The primary replies to the replica with the current replication offset. | ||
# 3. The replica waits for the replication offset to match on its side, to make | ||
# sure it processed all the data from the primary before it continues. | ||
# 4. The replica starts a failover, obtains a new configuration epoch from the | ||
# majority of the masters, and broadcasts the new configuration. | ||
# 5. The old primary receives the configuration update: unpause its clients and | ||
# starts replying with redirection messages so that they'll continue the chat | ||
# with the new primary. | ||
# | ||
# This way clients are moved away from the old primary to the new primary atomically | ||
# and only when the replica that is turning into the new primary has processed | ||
# all of the replication stream from the old primary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also mention what's the behavior if it's set to 0, is it indefinite timeout?
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Signed-off-by: Binbin <[email protected]>
Allows cluster admins to configure the cluster manual failover timeout as
needed, admins can configure how long a primary would be paused in the
worst case scenario such as a failover timed out due to the insufficient
votes.
The configuration name is cluster-manual-failover-timeout, the unit is
milliseconds, and the default value is 5000.