Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Should A ∧ !A always be false? #15839

Merged
merged 7 commits into from
Jan 31, 2025
Merged

[red-knot] Should A ∧ !A always be false? #15839

merged 7 commits into from
Jan 31, 2025

Conversation

dcreager
Copy link
Member

@dcreager dcreager commented Jan 30, 2025

This mimics a simplification we have on the OR side, where we simplify A ∨ !A to true. This requires changes to how we add while statements to the semantic index, since we now need distinct VisibilityConstraints if we need to model evaluating a Constraint multiple times at different points in the execution of the program.

@dcreager dcreager added internal An internal refactor or improvement red-knot Multi-file analysis & type inference labels Jan 30, 2025
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this holds, because we are dealing with three-valued logic (there is the Ambiguous possibility), and a value of Ambiguous truthiness can change truthiness between one time it is checked and another.

Multiple checks of the truthiness of the same condition only occur in while loops, which is why all of the test failures occur in while-loop tests. This PR models the premise that a while-loop test can never change truthiness over time, which would indeed lead to the conclusion seen in the test failures, that a while loop with no break must either be never entered at all, or must be an infinite loop once entered.

I think in order for this PR to be valid, we would need to create two separate visibility constraints for the same while-loop test expression, one for "as checked on first entry" and another for "as checked on subsequent iteration", so that we wouldn't actually see those as "the same constraint", and would thus model the possibility of entering the loop and then leaving it on a later iteration. (I think this same change in while loop visibility constraints would be necessary if we switch to BDDs for visibility constraints.)

I think technically the existing OR simplification is wrong for the same reason? But in practice it doesn't cause a problem, either because the OR simplification case doesn't arise with while loops, or because simplifying to ALWAYS_TRUE doesn't lead us to eliminate paths as unreachable, or for both reasons.

@carljm
Copy link
Contributor

carljm commented Jan 31, 2025

cc @sharkdp, you might find this interesting.

@dcreager
Copy link
Member Author

dcreager commented Jan 31, 2025

a value of Ambiguous truthiness can change truthiness between one time it is checked and another.

Yep, that makes sense! The part that I wasn't sure about was what the scope of "one time it is checked and another" is. My intuition would be that a literal (in the boolean formula sense, not the Python source sense) might evaluate to Ambiguous, meaning that we don't know if it's true or false — but that within the context of a particular evaluation of the boolean formula, it would have the same true or false value each time it appears. Which would make A ∧ !A == 0 hold.

Multiple checks of the truthiness of the same condition only occur in while loops, which is why all of the test failures occur in while-loop tests.

Or put another way, while-loop tests are the only place (at least for now) where we need a visibility constraint to model the relationship between two different states in the execution trace of the program.

I think technically the existing OR simplification is wrong for the same reason?

Definitely agree — and I think that's a better wording of my question! 😄 Either both A ∧ !A == 0 and A ∨ !A == 1 should hold, or neither should.

@carljm
Copy link
Contributor

carljm commented Jan 31, 2025

within the context of a particular evaluation of the boolean formula, it would have the same true or false value each time it appears

The problem is that we construct boolean formulae (for visibility of paths through a while loop) which contain a single constraint that represents checks of the while-test expression's truthiness at different points in execution state (first entry vs later iteration). Because we do this, we create the scenario that "the same constraint" in the same boolean formula doesn't necessarily have a consistent truthiness value, even for a single evaluation of the formula.

If we instead created separate visibility constraints (referring to the same test expression) for the truthiness checks in these two different execution states, then the simplification would hold. (I think this change in the semantic index builder's handling of while loops would not be too difficult, but I haven't tried to make the change.)

@carljm
Copy link
Contributor

carljm commented Jan 31, 2025

Once we model the back-edge in control flow in a while loop, we could even evaluate the test expression under two different conditions, one without the back-edge (for first entry) and one with the back-edge (for later iteration). This could change our evaluation of its static truthiness, too. (Imagine an x = True; while x: x = get_bool().)

* main:
  [`flake8-pyi`] Fix several correctness issues with `custom-type-var-return-type` (`PYI019`) (#15851)
  [`pyupgrade`] Reuse replacement logic from `UP046` and `UP047` (`UP040`) (#15840)
  [`refurb`] Avoid `None | None` as well as better detection and fix (`FURB168`) (#15779)
  Remove non-existing `lint.extendIgnore` editor setting (#15844)
  [`refurb`] Mark fix as unsafe if there are comments (`FURB171`) (#15832)
  [`flake8-comprehensions`] Skip when `TypeError` present from too many (kw)args for `C410`,`C411`, and `C418` (#15838)
  [`pyflakes`] Visit forward annotations in `TypeAliasType` as types (`F401`) (#15829)
@dcreager
Copy link
Member Author

I think in order for this PR to be valid, we would need to create two separate visibility constraints for the same while-loop test expression, one for "as checked on first entry" and another for "as checked on subsequent iteration", so that we wouldn't actually see those as "the same constraint", and would thus model the possibility of entering the loop and then leaving it on a later iteration. (I think this same change in while loop visibility constraints would be necessary if we switch to BDDs for visibility constraints.)

This is done!

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fixes the test failures, sweet!

Comment on lines 1002 to 1012
// The first time the body is executed, we will have only checked the first
// evaluation of the condition, which must be True. For the second and later times,
// the first evaluation must still have been True, and a later evaluation was also
// True. So the full visibility constraint is (first ∨ (first ∧ later)).
let first_and_later = self
.current_visibility_constraints_mut()
.add_and_constraint(first_vis_constraint_id, later_vis_constraint_id);
let body_vis_constraint_id = self
.current_visibility_constraints_mut()
.add_or_constraint(first_vis_constraint_id, first_and_later);
self.record_visibility_constraint_id(body_vis_constraint_id);
Copy link
Contributor

@carljm carljm Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't (first ∨ (first ∧ later)) simplify to just first? If first is true, it is always true; if first is false, it is always false.

Which I think is also intuitive, because for the body of the while loop to be reachable, it is both necessary and sufficient that the first evaluation of the test condition be truthy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch! I had thought I needed to assert that later evaluations of the condition also evaluate to true. But this shows that that's not the right interpretation — we don't know that we did evaluate the condition a second time, and so executing the body does not imply that (there was) a later evaluation (and it) evaluated to true.

Comment on lines 983 to 990
let first_vis_constraint_id =
self.add_visibility_constraint(VisibilityConstraint::VisibleIf(
VisibilityConstraintAtom::new_copy(constraint, 0),
));
let later_vis_constraint_id =
self.add_visibility_constraint(VisibilityConstraint::VisibleIf(
VisibilityConstraintAtom::new_copy(constraint, 1),
));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be a bit more ergonomic, and not have to expose VisibilityConstraintAtom and the implementation detail of its internal u8 to the semantic index builder? But I see there are complexities there, and with only one call site at the moment I don't think it matters much.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I removed the Atom wrapper. My thought here was to reduce the number of , 0s I'd have to add, but most of those were already hidden behind helper methods

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Making the u8 not visible at all would be more invasive, since that's how the caller up in the semantic index builder specifies that it wants distinct copies. It's up to the caller to choose the values, such that distinct u8s refer to different evaluations of the constraint. Do you have thoughts on a different API shape to express that?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was vaguely envisioning some kind of copy API that would take an existing constraint ID and bump the counter behind the scenes. But the awkwardness there is that a constraint ID might not represent a VisibleIf constraint at all, making the API error prone. So I'm not sure that would be an improvement. It's already much nicer without the extra Atom type, I think what you have is good.

@dcreager dcreager merged commit fab86de into main Jan 31, 2025
21 checks passed
@dcreager dcreager deleted the dcreager/visconand branch January 31, 2025 19:06
Comment on lines +175 to +185
/// The primitive atoms of the formula are [`Constraint`]s, which express some property of the
/// runtime state of the code that we are analyzing.
///
/// We assume that each atom has a stable value each time that the formula is evaluated. An atom
/// that resolves to `Ambiguous` might be true or false, and we can't tell which — but within that
/// evaluation, we assume that the atom has the _same_ unknown value each time it appears. That
/// allows us to perform simplifications like `A ∨ !A → true` and `A ∧ !A → false`.
///
/// That means that when you are constructing a formula, you might need to create distinct atoms
/// for a particular [`Constraint`], if your formula needs to consider how a particular runtime
/// property might be different at different points in the execution of the program.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's late here already, but just reading this, I wonder why it applies to while conditions, but not to something like

for _ in iterable:
    if <cond>:
        # …

Is it because we merge control flow twice here (after the if, after the for) instead of just once in the while loop? Would a break point inside the if complicate the situation?

Copy link
Contributor

@carljm carljm Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because currently a for loop generates no visibility constraints. (We don't attempt to detect statically-known-empty iterables.) If we did add that feature, we might need to consider this, but I'm still not sure it would apply unless we also tried to detect statically-known-infinite iterables! Which I doubt we would ever do.

EDIT: ignore the above, I totally missed the point of your comment, which was the nested if test. Thinking about it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I've totally thought through all the possible concrete visibility constraint trees we might generate here, but my intuition says it's not an issue in this scenario, because there is no difference between any one iteration of the loop and any other one, in terms of the effect on which code is visible or not. Which is not true in a while loop, where the key fact is that a static True means we'll never exit the loop, a static False means we'll never enter it, and a True that later changes to False means we can enter and then exit.

Copy link
Contributor

@sharkdp sharkdp Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like the following two are similar control flow constructs (ignoring else branches in loops…):

while cond:
    # <code block>

and

for _ in itertools.repeat(None):
    if not cond:
        break
    # <code block>

It looks like we can handle the former, but not the latter:

import itertools

cond = False

w = 1
f = 1

while cond:
    w = 2

for _ in itertools.repeat(None):
    if not cond:
        break

    f = 2

print(reveal_type(w))  # Literal[1]
print(reveal_type(f))  # Literal[1, 2] ?

Or am I missing something? This might not be related to this PR at all(?).

(Edit: I guess I could only expect the latter to be working with #15817)

Copy link
Contributor

@carljm carljm Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more than #15817. The latter example is only equivalent to the former if we are able to detect that an iterator is infinite. This is not reflected in the type system, and would require special casing for certain iterables (like ‘itertools.repeat`); it's not possible in general.

If the iterator is not infinite (or not known to be infinite), then exit from the loop is always possible at every iteration, regardless of the value of cond, and the result we currently give is correct.

Detecting known-empty iterators and their effect on for loop control flow is similarly difficult to do in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants