Nondeterministic behavior observed despite seed set #7525

Eladkay · 2025-01-23T04:01:37Z

I am trying to get Z3 to behave deterministically on a formula outside of the decidable fragment. I don't really mind more unknowns, but it is important that Z3 returns the same result every call. I was under the impression that it would be sufficient to set the random seed, but this seems to not be the case:

% z3 --version
Z3 version 4.13.4 - 64 bit
% z3 test.smt -t:10000 smt.random_seed=10101
unsat
unsat
unknown
unsat
unknown
% z3 test.smt -t:10000 smt.random_seed=10101
unsat
unsat
unknown
unknown
unsat

What other sources of nondeterminism could there be? Is this a bug in Z3 or am I missing something? This isn't a "borderline timeout" issue.

I am running on a MacBook Pro M2 with macOS 15.1.1. I also attach test.smt (renamed for GitHub).

Edit for more information: The formula has 5 check-sat-assuming commands at the end. The first two always return unsat with no issue, the third seems to always return unknown and the latter two may return unsat or unknown seemingly nondeterministically. However, when commenting out the first two commands, the latter three seem to always return sat; unsat; unsat. So, it seems that the first check-sat-assuming commands are interfering with the latter ones.

TIA!

test.txt

The text was updated successfully, but these errors were encountered:

NikolajBjorner · 2025-01-23T05:49:12Z

You are setting a timeout from the command prompt. The timeout fires in a way independent of CPU cycles.
You can use rlimit (inside of the SMT file each check-sat) to control resource usage.

What may be non-obvious is that the timeout value you use appears very high. I get non-deterministic behavior for -t:100 not your value on Windows. Are you therefore reporting a Mac issue with timeouts?

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100
unsat
unknown
unknown
unsat
unknown

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100
unsat
unknown
unknown
unknown
unknown

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100000
unsat
unsat
sat
unsat
unsat

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100000
unsat
unsat
sat
unsat

Eladkay · 2025-01-23T06:38:42Z

I was actually under the impression that the timeout is in milliseconds, so it's not so long. The timeout being considered long just strengthens my claim that this is not caused by minor deviations in OS interrupt time causing the formula to sometimes exceed the timeout despite the solver supposedly doing the same thing.

Indeed, on my computer I get "non-deterministic" output for the lower timeout you suggested:

% z3 -t:100 -v:1 smt_tests/nondet_behavior.smt smt.random_seed=10101
unsat
unsat
unknown
unknown
unknown
% z3 -t:100 -v:1 smt_tests/nondet_behavior.smt smt.random_seed=10101
unsat
unsat
unknown
unsat
unsat

Which I think may be attributed to OS time.

I don't think this is a timeout issue. The timeout is just because it'd otherwise probably run forever. I'm claiming that the solver itself introduces non-determinism that does not depend only on the seed. Also note the weird behavior I observed in the edit to the original issue, where calls to check-sat-assuming change the behavior of latter calls.

Thanks for the quick response!

NikolajBjorner · 2025-01-23T07:07:01Z

If you simply remove the timeout arguments from the command-line you could know if to rule out the timer from this behavior.

Next, add a line (get-info :reason-unknown) after each check-sat.
If the reason is "canceled" or "timeout", it would be consistent with the timer.

Next, it could be platform behavior.
Platform behavior is tough to establish and debug as it requires two platforms with deviating behavior.
After that the easier way to pinpoint is to instrument mk_app_core in ast.cpp to track identifiers and hashes of hash objects and break at the first difference.

Eladkay · 2025-01-23T07:47:23Z

It is "timeout", but the question is twofold:

Why does the solver behave in a way that causes it to sometimes finish within 10 seconds and sometimes not? The timeout is just an indicator that this is happening, but the question is why do we even reach the timeout sometimes and sometimes we don't.
Why are these check-sat-assuming calls affected so significantly, consistently and negatively by previous, unrelated check-sat-assuming calls, even if each check-sat-assuming call is wrapped in (push) ... (pop)? This seems like a bug, which could indeed be a platform-specific bug.

I have asked a colleague to run the queries on WSL (Linux) running on a Windows computer, and got the same behavior:

~$ z3 --version
Z3 version 4.13.4 - 64 bit
~$ z3 -t:10000 -v:1 nondet_behavior_1.smt smt.random_seed=10101 | tee nondet_behavior_1.out
sat
unsat
unsat
~$ z3 -t:10000 -v:1 nondet_behavior_2.smt smt.random_seed=10101 | tee nondet_behavior_2.out
unsat
unsat
unknown
unsat
unsat

(The two queries are exactly the same, except the first one has the first two check-sat-assuming calls commented out)

Eladkay · 2025-01-23T08:04:13Z

Here it is on Windows as well:

PS snip> .\z3\z3.exe --version
Z3 version 4.13.4 - 64 bit
PS snip> .\z3\z3.exe -t:10000 -v:1 .\nondet_behavior_1.smt smt.random_seed=10101
sat
unsat
unsat
PS snip> .\z3\z3.exe -t:10000 -v:1 .\nondet_behavior_2.smt smt.random_seed=10101
unsat
unsat
unknown
unknown
unknown

So, it doesn't seem platform-specific.

Eladkay · 2025-01-24T02:55:32Z

Also, it's the same behavior when using (push)...(pop) to replace check-sat-assuming entirely:

% tail -16 smt_tests/nondet_behavior_1.smt
 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal actInd_send_1a invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
 
 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal actInd_join_round invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)

 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal actInd_propose invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
 
 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal actInd_cast_vote invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
 
 (push) (assert (and actInd_decide invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
%  z3 -t:100 -v:1 smt_tests/nondet_behavior_1.smt smt.random_seed=10101
unsat
unsat
unknown
unknown

nondet_behavior_1.txt

NikolajBjorner · 2025-01-24T17:36:54Z

I have been testing this on newest builds. It doesn't reproduce Windows, wsl2.
I tried also on a different branch based on older versions of z3. There, the third step diverges.
So, I believe the behavior was "fixed" since. It would take some bisection to pinpoint where and whether the behavior is only masked.

Eladkay · 2025-01-25T13:09:58Z

I can confirm that the behavior in the latest master seems to be much more reasonable:

% ./z3 -t:10000 -v:1 nondet_behavior_1_new.smt smt.random_seed=10101 
unknown
unknown
unknown
% ./z3 -t:10000 -v:1 nondet_behavior_2_new.smt smt.random_seed=10101
unsat
unsat
sat
unsat
unsat

It's still quite weird how running check-sat-assuming changes the result of other check-sat-assuming calls to such a large degree. Do you have any insight on that? Do you think it'd be a good idea to run check-sat-assuming without actually needing the result in order to "tip off" the solver?

NikolajBjorner · 2025-02-02T22:22:32Z

The behavior of the solver is determined by a number of factors, such as the current set of expressions that are created. If you creaute expressions in different order, it results in a different solver state. If you run a solver call and interrupt it, the state after interruption is not the same as the state before the solver was invoked. It is determined by the state changes during solving.
So then the solver, as it is heuristic, can take a different path towards solving formulas.

NikolajBjorner closed this as completed Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nondeterministic behavior observed despite seed set #7525

Nondeterministic behavior observed despite seed set #7525

Eladkay commented Jan 23, 2025 •

edited

Loading

NikolajBjorner commented Jan 23, 2025

Eladkay commented Jan 23, 2025

NikolajBjorner commented Jan 23, 2025

Eladkay commented Jan 23, 2025 •

edited

Loading

Eladkay commented Jan 23, 2025

Eladkay commented Jan 24, 2025

NikolajBjorner commented Jan 24, 2025

Eladkay commented Jan 25, 2025

NikolajBjorner commented Feb 2, 2025

Nondeterministic behavior observed despite seed set #7525

Nondeterministic behavior observed despite seed set #7525

Comments

Eladkay commented Jan 23, 2025 • edited Loading

NikolajBjorner commented Jan 23, 2025

Eladkay commented Jan 23, 2025

NikolajBjorner commented Jan 23, 2025

Eladkay commented Jan 23, 2025 • edited Loading

Eladkay commented Jan 23, 2025

Eladkay commented Jan 24, 2025

NikolajBjorner commented Jan 24, 2025

Eladkay commented Jan 25, 2025

NikolajBjorner commented Feb 2, 2025

Eladkay commented Jan 23, 2025 •

edited

Loading

Eladkay commented Jan 23, 2025 •

edited

Loading