Skip to content

Commit

Permalink
CA-403759: Initialise licensing after no-other-masters check (#6257)
Browse files Browse the repository at this point in the history
When the coordinator restarts. the no-other-masters check in the startup
sequence does two things for each pool member:

1. It checks that the host agrees that it is are not the coordinator.
2. It unblocks the host's master_connection thread, which is likely
waiting
   for a reconnection delay to expire, which may be up to 256 seconds
   (exponential backoff is used). The delay is interrupted to
   immediately unblock DB calls.

Licensing initialisation comes earlier in the startup sequence, but
under certain circumstance make calls to other host, in particular after
an upgrade. A this time, hosts may still be blocked on the
master_connection for up to 256 s, which adds an unnecessary delay to
the coordinator's startup sequence and therefore the usability of the
API.

Address this by reversing the order of the two startup actions.
  • Loading branch information
robhoes authored Jan 28, 2025
2 parents d6e1139 + f346848 commit 8a293a0
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 12 deletions.
15 changes: 4 additions & 11 deletions ocaml/database/master_connection.ml
Original file line number Diff line number Diff line change
Expand Up @@ -235,24 +235,17 @@ let do_db_xml_rpc_persistent_with_reopen ~host:_ ~path (req : string) :
let time_sofar = Unix.gettimeofday () -. time_call_started in
if !connection_timeout < 0. then (
if not !surpress_no_timeout_logs then (
debug
"Connection to master died. I will continue to retry indefinitely \
(supressing future logging of this message)." ;
error
"Connection to master died. I will continue to retry indefinitely \
(supressing future logging of this message)."
) ;
surpress_no_timeout_logs := true
(supressing future logging of this message)." ;
surpress_no_timeout_logs := true
)
) else
debug
"Connection to master died: time taken so far in this call '%f'; will \
%s"
time_sofar
( if !connection_timeout < 0. then
"never timeout"
else
Printf.sprintf "timeout after '%f'" !connection_timeout
) ;
(Printf.sprintf "timeout after '%f'" !connection_timeout) ;
if time_sofar > !connection_timeout && !connection_timeout >= 0. then
if !restart_on_connection_timeout then (
debug "Exceeded timeout for retrying master connection: restarting xapi" ;
Expand Down
2 changes: 1 addition & 1 deletion ocaml/xapi/xapi.ml
Original file line number Diff line number Diff line change
Expand Up @@ -1218,7 +1218,6 @@ let server_init () =
, []
, Monitor_master.update_configuration_from_master
)
; ("Initialising licensing", [], handle_licensing)
; ( "message_hook_thread"
, [Startup.NoExnRaising]
, Xapi_message.start_message_hook_thread ~__context
Expand Down Expand Up @@ -1252,6 +1251,7 @@ let server_init () =
, [Startup.OnlyMaster]
, check_no_other_masters
)
; ("Initialising licensing", [], handle_licensing)
; ( "Registering periodic functions"
, []
, fun () -> Xapi_periodic_scheduler_init.register ~__context
Expand Down

0 comments on commit 8a293a0

Please sign in to comment.