-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Should A ∧ !A always be false? #15839
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this holds, because we are dealing with three-valued logic (there is the Ambiguous
possibility), and a value of Ambiguous
truthiness can change truthiness between one time it is checked and another.
Multiple checks of the truthiness of the same condition only occur in while loops, which is why all of the test failures occur in while-loop tests. This PR models the premise that a while-loop test can never change truthiness over time, which would indeed lead to the conclusion seen in the test failures, that a while
loop with no break
must either be never entered at all, or must be an infinite loop once entered.
I think in order for this PR to be valid, we would need to create two separate visibility constraints for the same while-loop test expression, one for "as checked on first entry" and another for "as checked on subsequent iteration", so that we wouldn't actually see those as "the same constraint", and would thus model the possibility of entering the loop and then leaving it on a later iteration. (I think this same change in while
loop visibility constraints would be necessary if we switch to BDDs for visibility constraints.)
I think technically the existing OR simplification is wrong for the same reason? But in practice it doesn't cause a problem, either because the OR simplification case doesn't arise with while loops, or because simplifying to ALWAYS_TRUE
doesn't lead us to eliminate paths as unreachable, or for both reasons.
cc @sharkdp, you might find this interesting. |
Yep, that makes sense! The part that I wasn't sure about was what the scope of "one time it is checked and another" is. My intuition would be that a literal (in the boolean formula sense, not the Python source sense) might evaluate to
Or put another way, while-loop tests are the only place (at least for now) where we need a visibility constraint to model the relationship between two different states in the execution trace of the program.
Definitely agree — and I think that's a better wording of my question! 😄 Either both |
The problem is that we construct boolean formulae (for visibility of paths through a If we instead created separate visibility constraints (referring to the same test expression) for the truthiness checks in these two different execution states, then the simplification would hold. (I think this change in the semantic index builder's handling of |
Once we model the back-edge in control flow in a while loop, we could even evaluate the test expression under two different conditions, one without the back-edge (for first entry) and one with the back-edge (for later iteration). This could change our evaluation of its static truthiness, too. (Imagine an |
* main: [`flake8-pyi`] Fix several correctness issues with `custom-type-var-return-type` (`PYI019`) (#15851) [`pyupgrade`] Reuse replacement logic from `UP046` and `UP047` (`UP040`) (#15840) [`refurb`] Avoid `None | None` as well as better detection and fix (`FURB168`) (#15779) Remove non-existing `lint.extendIgnore` editor setting (#15844) [`refurb`] Mark fix as unsafe if there are comments (`FURB171`) (#15832) [`flake8-comprehensions`] Skip when `TypeError` present from too many (kw)args for `C410`,`C411`, and `C418` (#15838) [`pyflakes`] Visit forward annotations in `TypeAliasType` as types (`F401`) (#15829)
This is done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this fixes the test failures, sweet!
// The first time the body is executed, we will have only checked the first | ||
// evaluation of the condition, which must be True. For the second and later times, | ||
// the first evaluation must still have been True, and a later evaluation was also | ||
// True. So the full visibility constraint is (first ∨ (first ∧ later)). | ||
let first_and_later = self | ||
.current_visibility_constraints_mut() | ||
.add_and_constraint(first_vis_constraint_id, later_vis_constraint_id); | ||
let body_vis_constraint_id = self | ||
.current_visibility_constraints_mut() | ||
.add_or_constraint(first_vis_constraint_id, first_and_later); | ||
self.record_visibility_constraint_id(body_vis_constraint_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't (first ∨ (first ∧ later))
simplify to just first
? If first
is true, it is always true; if first
is false, it is always false.
Which I think is also intuitive, because for the body of the while loop to be reachable, it is both necessary and sufficient that the first evaluation of the test condition be truthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch! I had thought I needed to assert that later evaluations of the condition also evaluate to true. But this shows that that's not the right interpretation — we don't know that we did evaluate the condition a second time, and so executing the body does not imply that (there was) a later evaluation (and it) evaluated to true.
let first_vis_constraint_id = | ||
self.add_visibility_constraint(VisibilityConstraint::VisibleIf( | ||
VisibilityConstraintAtom::new_copy(constraint, 0), | ||
)); | ||
let later_vis_constraint_id = | ||
self.add_visibility_constraint(VisibilityConstraint::VisibleIf( | ||
VisibilityConstraintAtom::new_copy(constraint, 1), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be a bit more ergonomic, and not have to expose VisibilityConstraintAtom
and the implementation detail of its internal u8
to the semantic index builder? But I see there are complexities there, and with only one call site at the moment I don't think it matters much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I removed the Atom
wrapper. My thought here was to reduce the number of , 0
s I'd have to add, but most of those were already hidden behind helper methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Making the u8
not visible at all would be more invasive, since that's how the caller up in the semantic index builder specifies that it wants distinct copies. It's up to the caller to choose the values, such that distinct u8
s refer to different evaluations of the constraint. Do you have thoughts on a different API shape to express that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was vaguely envisioning some kind of copy
API that would take an existing constraint ID and bump the counter behind the scenes. But the awkwardness there is that a constraint ID might not represent a VisibleIf
constraint at all, making the API error prone. So I'm not sure that would be an improvement. It's already much nicer without the extra Atom
type, I think what you have is good.
/// The primitive atoms of the formula are [`Constraint`]s, which express some property of the | ||
/// runtime state of the code that we are analyzing. | ||
/// | ||
/// We assume that each atom has a stable value each time that the formula is evaluated. An atom | ||
/// that resolves to `Ambiguous` might be true or false, and we can't tell which — but within that | ||
/// evaluation, we assume that the atom has the _same_ unknown value each time it appears. That | ||
/// allows us to perform simplifications like `A ∨ !A → true` and `A ∧ !A → false`. | ||
/// | ||
/// That means that when you are constructing a formula, you might need to create distinct atoms | ||
/// for a particular [`Constraint`], if your formula needs to consider how a particular runtime | ||
/// property might be different at different points in the execution of the program. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's late here already, but just reading this, I wonder why it applies to while conditions, but not to something like
for _ in iterable:
if <cond>:
# …
Is it because we merge control flow twice here (after the if, after the for) instead of just once in the while loop? Would a break
point inside the if
complicate the situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because currently a for
loop generates no visibility constraints. (We don't attempt to detect statically-known-empty iterables.) If we did add that feature, we might need to consider this, but I'm still not sure it would apply unless we also tried to detect statically-known-infinite iterables! Which I doubt we would ever do.
EDIT: ignore the above, I totally missed the point of your comment, which was the nested if
test. Thinking about it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I've totally thought through all the possible concrete visibility constraint trees we might generate here, but my intuition says it's not an issue in this scenario, because there is no difference between any one iteration of the loop and any other one, in terms of the effect on which code is visible or not. Which is not true in a while
loop, where the key fact is that a static True
means we'll never exit the loop, a static False
means we'll never enter it, and a True
that later changes to False
means we can enter and then exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like the following two are similar control flow constructs (ignoring else
branches in loops…):
while cond:
# <code block>
and
for _ in itertools.repeat(None):
if not cond:
break
# <code block>
It looks like we can handle the former, but not the latter:
import itertools
cond = False
w = 1
f = 1
while cond:
w = 2
for _ in itertools.repeat(None):
if not cond:
break
f = 2
print(reveal_type(w)) # Literal[1]
print(reveal_type(f)) # Literal[1, 2] ?
Or am I missing something? This might not be related to this PR at all(?).
(Edit: I guess I could only expect the latter to be working with #15817)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more than #15817. The latter example is only equivalent to the former if we are able to detect that an iterator is infinite. This is not reflected in the type system, and would require special casing for certain iterables (like ‘itertools.repeat`); it's not possible in general.
If the iterator is not infinite (or not known to be infinite), then exit from the loop is always possible at every iteration, regardless of the value of cond
, and the result we currently give is correct.
Detecting known-empty iterators and their effect on for
loop control flow is similarly difficult to do in general.
This mimics a simplification we have on the OR side, where we simplify
A ∨ !A
to true. This requires changes to how we addwhile
statements to the semantic index, since we now need distinctVisibilityConstraint
s if we need to model evaluating aConstraint
multiple times at different points in the execution of the program.