Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic behavior observed despite seed set #7525

Closed
Eladkay opened this issue Jan 23, 2025 · 9 comments
Closed

Nondeterministic behavior observed despite seed set #7525

Eladkay opened this issue Jan 23, 2025 · 9 comments

Comments

@Eladkay
Copy link

Eladkay commented Jan 23, 2025

I am trying to get Z3 to behave deterministically on a formula outside of the decidable fragment. I don't really mind more unknowns, but it is important that Z3 returns the same result every call. I was under the impression that it would be sufficient to set the random seed, but this seems to not be the case:

% z3 --version
Z3 version 4.13.4 - 64 bit
% z3 test.smt -t:10000 smt.random_seed=10101
unsat
unsat
unknown
unsat
unknown
% z3 test.smt -t:10000 smt.random_seed=10101
unsat
unsat
unknown
unknown
unsat

What other sources of nondeterminism could there be? Is this a bug in Z3 or am I missing something? This isn't a "borderline timeout" issue.

I am running on a MacBook Pro M2 with macOS 15.1.1. I also attach test.smt (renamed for GitHub).

Edit for more information: The formula has 5 check-sat-assuming commands at the end. The first two always return unsat with no issue, the third seems to always return unknown and the latter two may return unsat or unknown seemingly nondeterministically. However, when commenting out the first two commands, the latter three seem to always return sat; unsat; unsat. So, it seems that the first check-sat-assuming commands are interfering with the latter ones.

TIA!

test.txt

@NikolajBjorner
Copy link
Contributor

You are setting a timeout from the command prompt. The timeout fires in a way independent of CPU cycles.
You can use rlimit (inside of the SMT file each check-sat) to control resource usage.

What may be non-obvious is that the timeout value you use appears very high. I get non-deterministic behavior for -t:100 not your value on Windows. Are you therefore reporting a Mac issue with timeouts?

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100
unsat
unknown
unknown
unsat
unknown

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100
unsat
unknown
unknown
unknown
unknown

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100000
unsat
unsat
sat
unsat
unsat

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100000
unsat
unsat
sat
unsat
unsat

C:\z3\release>z3 test.smt2 smt.random_seed=10101 /v:1 -t:100000
unsat
unsat
sat
unsat

@Eladkay
Copy link
Author

Eladkay commented Jan 23, 2025

I was actually under the impression that the timeout is in milliseconds, so it's not so long. The timeout being considered long just strengthens my claim that this is not caused by minor deviations in OS interrupt time causing the formula to sometimes exceed the timeout despite the solver supposedly doing the same thing.

Indeed, on my computer I get "non-deterministic" output for the lower timeout you suggested:

% z3 -t:100 -v:1 smt_tests/nondet_behavior.smt smt.random_seed=10101
unsat
unsat
unknown
unknown
unknown
% z3 -t:100 -v:1 smt_tests/nondet_behavior.smt smt.random_seed=10101
unsat
unsat
unknown
unsat
unsat

Which I think may be attributed to OS time.

I don't think this is a timeout issue. The timeout is just because it'd otherwise probably run forever. I'm claiming that the solver itself introduces non-determinism that does not depend only on the seed. Also note the weird behavior I observed in the edit to the original issue, where calls to check-sat-assuming change the behavior of latter calls.

Thanks for the quick response!

@NikolajBjorner
Copy link
Contributor

If you simply remove the timeout arguments from the command-line you could know if to rule out the timer from this behavior.

Next, add a line (get-info :reason-unknown) after each check-sat.
If the reason is "canceled" or "timeout", it would be consistent with the timer.

Next, it could be platform behavior.
Platform behavior is tough to establish and debug as it requires two platforms with deviating behavior.
After that the easier way to pinpoint is to instrument mk_app_core in ast.cpp to track identifiers and hashes of hash objects and break at the first difference.

@Eladkay
Copy link
Author

Eladkay commented Jan 23, 2025

It is "timeout", but the question is twofold:

  • Why does the solver behave in a way that causes it to sometimes finish within 10 seconds and sometimes not? The timeout is just an indicator that this is happening, but the question is why do we even reach the timeout sometimes and sometimes we don't.
  • Why are these check-sat-assuming calls affected so significantly, consistently and negatively by previous, unrelated check-sat-assuming calls, even if each check-sat-assuming call is wrapped in (push) ... (pop)? This seems like a bug, which could indeed be a platform-specific bug.

I have asked a colleague to run the queries on WSL (Linux) running on a Windows computer, and got the same behavior:

~$ z3 --version
Z3 version 4.13.4 - 64 bit
~$ z3 -t:10000 -v:1 nondet_behavior_1.smt smt.random_seed=10101 | tee nondet_behavior_1.out
sat
unsat
unsat
~$ z3 -t:10000 -v:1 nondet_behavior_2.smt smt.random_seed=10101 | tee nondet_behavior_2.out
unsat
unsat
unknown
unsat
unsat

(The two queries are exactly the same, except the first one has the first two check-sat-assuming calls commented out)

@Eladkay
Copy link
Author

Eladkay commented Jan 23, 2025

Here it is on Windows as well:

PS snip> .\z3\z3.exe --version
Z3 version 4.13.4 - 64 bit
PS snip> .\z3\z3.exe -t:10000 -v:1 .\nondet_behavior_1.smt smt.random_seed=10101
sat
unsat
unsat
PS snip> .\z3\z3.exe -t:10000 -v:1 .\nondet_behavior_2.smt smt.random_seed=10101
unsat
unsat
unknown
unknown
unknown

So, it doesn't seem platform-specific.

@Eladkay
Copy link
Author

Eladkay commented Jan 24, 2025

Also, it's the same behavior when using (push)...(pop) to replace check-sat-assuming entirely:

% tail -16 smt_tests/nondet_behavior_1.smt
 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal actInd_send_1a invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
 
 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal actInd_join_round invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)

 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal actInd_propose invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
 
 (push) (assert (and (not actInd_decide) invInd_PaxosFirstOrder_choosable_proposal actInd_cast_vote invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
 
 (push) (assert (and actInd_decide invInd_PaxosFirstOrder_choosable_proposal (not actInd_cast_vote) invInd_PaxosFirstOrder_choosable_proposal (not actInd_propose) invInd_PaxosFirstOrder_choosable_proposal (not actInd_join_round) invInd_PaxosFirstOrder_choosable_proposal (not actInd_send_1a) invInd_PaxosFirstOrder_choosable_proposal ))
 (check-sat) (pop)
%  z3 -t:100 -v:1 smt_tests/nondet_behavior_1.smt smt.random_seed=10101
unsat
unsat
unknown
unknown

nondet_behavior_1.txt

@NikolajBjorner
Copy link
Contributor

I have been testing this on newest builds. It doesn't reproduce Windows, wsl2.
I tried also on a different branch based on older versions of z3. There, the third step diverges.
So, I believe the behavior was "fixed" since. It would take some bisection to pinpoint where and whether the behavior is only masked.

@Eladkay
Copy link
Author

Eladkay commented Jan 25, 2025

I can confirm that the behavior in the latest master seems to be much more reasonable:

% ./z3 -t:10000 -v:1 nondet_behavior_1_new.smt smt.random_seed=10101 
unknown
unknown
unknown
% ./z3 -t:10000 -v:1 nondet_behavior_2_new.smt smt.random_seed=10101
unsat
unsat
sat
unsat
unsat

It's still quite weird how running check-sat-assuming changes the result of other check-sat-assuming calls to such a large degree. Do you have any insight on that? Do you think it'd be a good idea to run check-sat-assuming without actually needing the result in order to "tip off" the solver?

@NikolajBjorner
Copy link
Contributor

The behavior of the solver is determined by a number of factors, such as the current set of expressions that are created. If you creaute expressions in different order, it results in a different solver state. If you run a solver call and interrupt it, the state after interruption is not the same as the state before the solver was invoked. It is determined by the state changes during solving.
So then the solver, as it is heuristic, can take a different path towards solving formulas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants