Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690

enjoy-binbin · 2025-02-08T03:51:14Z

Allows cluster admins to configure the cluster manual failover timeout as
needed, admins can configure how long a primary would be paused in the
worst case scenario such as a failover timed out due to the insufficient
votes.

The configuration name is cluster-manual-failover-timeout, the unit is
milliseconds, and the default value is 5000.

…al failover Allows cluster admins to configure the cluster manual failover timeout as needed, admins can configure how long a primary would be paused in the worst case scenario such as a failover timed out due to the insufficient votes. The configuration name is cluster-manual-failover-timeout, the unit is milliseconds, and the default value is 5000. Signed-off-by: Binbin <[email protected]>

enjoy-binbin · 2025-02-08T03:53:29Z

valkey.conf

+# A manual failover is a special kind of failover that is usually executed when
+# there are no actual failures, but we wish to swap the current primary with one
+# of its replicas (which is the node we send the CLUSTER FAILOVER command to) in
+# a safe way, without any window for data loss.
+#
+# It works in the following way:
+# 1. The replica tells the primary to stop processing queries from clients.
+# 2. The primary replies to the replica with the current replication offset.
+# 3. The replica waits for the replication offset to match on its side, to make
+#    sure it processed all the data from the primary before it continues.
+# 4. The replica starts a failover, obtains a new configuration epoch from the
+#    majority of the masters, and broadcasts the new configuration.
+# 5. The old primary receives the configuration update: unpause its clients and
+#    starts replying with redirection messages so that they'll continue the chat
+#    with the new primary.
+#
+# This way clients are moved away from the old primary to the new primary atomically
+# and only when the replica that is turning into the new primary has processed
+# all of the replication stream from the old primary.


i took this from CLUSTER FAILOVER command docs, let me know if it is too much, please also feel free to insert a suggestion if you have a good wording.

Could we also mention what's the behavior if it's set to 0, is it indefinite timeout?

i think we should set a min and max value of it (that is not allow 0), but i am not sure about the value, do you have some ideas?

I agree, I don't think a value of 0 makes sense since there is no way to abort (afaik, you might be able to trigger another failover). I would vote a minimum time of like 1000ms.

do we also want a max value? The big value also does not makes senese and we will hit the overflow issue when doing this now + (server.cluster_mf_timeout * CLUSTER_MF_PAUSE_MULT) if someone test it.

I don't have a strong opinion, but agree a high value is bad since it's not interruptible. Maybe like a minute? (Seems like such a small range, maybe minimum should be lower, like 250ms)

Signed-off-by: Binbin <[email protected]>

codecov · 2025-02-08T04:16:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.87%. Comparing base (e9ed53c) to head (74d1642).
Report is 1 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1690      +/-   ##
============================================
- Coverage     71.08%   70.87%   -0.22%     
============================================
  Files           121      121              
  Lines         65254    65255       +1     
============================================
- Hits          46385    46248     -137     
- Misses        18869    19007     +138

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`85.90% <100.00%> (-0.36%)`	⬇️
src/config.c	`78.39% <ø> (ø)`
src/server.h	`100.00% <ø> (ø)`

... and 13 files with indirect coverage changes

Signed-off-by: Binbin <[email protected]>

src/server.h

hpatro · 2025-02-10T21:51:25Z

valkey.conf

+# A manual failover is a special kind of failover that is usually executed when
+# there are no actual failures, but we wish to swap the current primary with one
+# of its replicas (which is the node we send the CLUSTER FAILOVER command to) in
+# a safe way, without any window for data loss.
+#
+# It works in the following way:
+# 1. The replica tells the primary to stop processing queries from clients.
+# 2. The primary replies to the replica with the current replication offset.
+# 3. The replica waits for the replication offset to match on its side, to make
+#    sure it processed all the data from the primary before it continues.
+# 4. The replica starts a failover, obtains a new configuration epoch from the
+#    majority of the masters, and broadcasts the new configuration.
+# 5. The old primary receives the configuration update: unpause its clients and
+#    starts replying with redirection messages so that they'll continue the chat
+#    with the new primary.
+#
+# This way clients are moved away from the old primary to the new primary atomically
+# and only when the replica that is turning into the new primary has processed
+# all of the replication stream from the old primary.


Could we also mention what's the behavior if it's set to 0, is it indefinite timeout?

valkey.conf

Signed-off-by: Binbin <[email protected]>

enjoy-binbin requested a review from a team February 8, 2025 03:51

enjoy-binbin commented Feb 8, 2025

View reviewed changes

correct it is a write pause

042bf9e

Signed-off-by: Binbin <[email protected]>

Fix format

74d1642

Signed-off-by: Binbin <[email protected]>

hpatro reviewed Feb 10, 2025

View reviewed changes

madolson reviewed Feb 13, 2025

View reviewed changes

valkey.conf Outdated Show resolved Hide resolved

enjoy-binbin added 3 commits February 14, 2025 10:31

Code review, set the min to 1000

c071a3d

Signed-off-by: Binbin <[email protected]>

Merge remote-tracking branch 'upstream/unstable' into cluster_mf_timeout

5f249b8

Signed-off-by: Binbin <[email protected]>

Code review, set the min to 250 and max to 60000

2c4a293

Signed-off-by: Binbin <[email protected]>

enjoy-binbin added release-notes This issue should get a line item in the release notes major-decision-pending Major decision pending by TSC team labels Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690

Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690

enjoy-binbin commented Feb 8, 2025

enjoy-binbin Feb 8, 2025

hpatro Feb 10, 2025

enjoy-binbin Feb 11, 2025

madolson Feb 13, 2025

enjoy-binbin Feb 14, 2025

madolson Feb 15, 2025

codecov bot commented Feb 8, 2025 •

edited

Loading

hpatro Feb 10, 2025

Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690

Are you sure you want to change the base?

Add cluster-manual-failover-timeout to configure the timeout for manual failover #1690

Conversation

enjoy-binbin commented Feb 8, 2025

enjoy-binbin Feb 8, 2025

Choose a reason for hiding this comment

hpatro Feb 10, 2025

Choose a reason for hiding this comment

enjoy-binbin Feb 11, 2025

Choose a reason for hiding this comment

madolson Feb 13, 2025

Choose a reason for hiding this comment

enjoy-binbin Feb 14, 2025

Choose a reason for hiding this comment

madolson Feb 15, 2025

Choose a reason for hiding this comment

codecov bot commented Feb 8, 2025 • edited Loading

Codecov Report

hpatro Feb 10, 2025

Choose a reason for hiding this comment

codecov bot commented Feb 8, 2025 •

edited

Loading