Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make noqa parsing consistent and more robust #16483

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

dylwil3
Copy link
Collaborator

@dylwil3 dylwil3 commented Mar 3, 2025

Summary

The goal of this PR is to address various issues around parsing suppression comments by

  1. Unifying the logic used to parse in-line (# noqa) and file-level (# ruff: noqa) noqa comments
  2. Recovering from certain errors and surfacing warnings in these cases

Closes #15682
Supersedes #12811
Addresses #14229 (comment)
Related: #14229 , #12809


Current suppression behavior

The implementations for parsing in-line and file-level noqa comments are entirely separate, so a number of discrepancies have arisen. You can unfurl some examples of this below.

Examples of discrepancies

Leading Hashes

This works:

from typing import List ## noqa: UP035

This does not, and emits no warning:

from typing import List
## ruff: noqa: UP035

Missing colon

This silently ignores all rules on the line, even though
the user intended a single rule to be ignored:

from typing import List # noqa UP035

This prints a warning and does not suppress anything:

from typing import List 
# ruff: noqa UP035

Missing Delimiter

This works to suppress both violations:

from typing import List # noqa: UP035F401

This warns and suppresses neither:

from typing import List 
#ruff: noqa: UP035F401

Multiple Commas

This works to suppress both violations:

from typing import List #noqa: UP035,,,F401

This suppresses the first but not the second.
It does not emit a warning:

from typing import List 
#ruff:noqa: UP035,,,F401

Missing whitespace before additional comments

This works:

from typing import List #noqa: UP035abc

This warns and does nothing:

from typing import List 
#ruff:noqa: UP035abc

Additional comments after noqa

This works:

from typing import List #noqa: UP035 and more comments

And so does this:

from typing import List #noqa and more comments

And so does this:

from typing import List 
#ruff: noqa: UP035 and more comments

But this emits a warning and suppresses nothing:

from typing import List 
#ruff:noqa and more comments

Design

The specification for the unified parsing logic is as follows:

  1. A suppression comment must be contained in a comment range and begins with a prefix matching one of the following regexes:
  • (file level) #\s*(?:ruff|flake8)\s*:\s*(?i)noqa
  • (inline) #\s*(?i)noqa
  1. If the character following the noqa is '#', whitespace, or if there are no characters following, then the suppression comment suppressed all rules. If the character following is ":", proceed to (3). Otherwise, warn the user for InvalidSuffix.
  2. We expect a list of rule codes, separated by commas or whitespace. After splitting first on commas, and then on whitespace, we attempt to parse each item in turn. If parsing a nonempty item returns no rule codes, we stop. If no codes were found at all we warn the user with MissingCodes.
  3. A rule code is a match for the regex [A-Z]+[0-9]+. While it is expected that there is one rule code per item, parsing continues even if that is not true, (e.g., F401F841 or F401,,F841), and the user is warned of a MissingDelimiter or MissingItem. If an item does not consist entirely of codes (e.g. F401abc), parsing aborts with an error and the user is warned of an InvalidCodeSuffix - the exception is if we encounter #, in which case parsing stops and returns all codes collected to that point (e.g. # noqa: F401#comment).

Implementation

Originally this was implemented closely following the above description, using split(',') and split_ascii_whitespace and iterating along. However, a Code consists of both a string slice and a range in the source code, and it proved too difficult (for me!) to keep track of the byte offsets with this approach. So we adopted a more manual approach encapsulated in NoqaParser, with logic beginning in NoqaParser::parse.

A ParsedNoqa consists of a list of warnings, which are non-fatal syntax errors encountered during parsing (namely MissingItem and MissingDelimiter) and the found Directive (which is either All or the list of codes and their ranges in the source text). One is really only expected to use parse_inline_noqa and parse_file_exemption, which are module-level methods that use NoqaParser internally. This convention could be enforced if we wanted to move NoqaParser to another submodule (though we would also need to expose a wrapper around NoqaParser::parse_code because that is used in the implementation of blanket-noqa (PGH004)).

Other changes and affected rules/behavior

The only rules that appear to be affected are blanket-noqa (PGH004)) and unused-noqa (RUF100) (but we'll see if others come up in the ecosystem check). For example, we no longer parse the first code in # noqa F401.F841 and instead emit a warning, which changes one of the snapshots for RUF100.

Internally, we no longer distinguish between a Directive and a ParsedFileExemption (keeping just the former), since these were used almost identically. They are carried around inside of NoqaDirectiveLine and FileNoqaDirectedLine, respectively, so there is little risk in misunderstanding the context if we just use Directive in both cases.

Review

It may not quite be easiest to review commit-by-commit, but let me give a quick summary of the git log and you can choose your own adventure if it is helpful:

  1. The current behavior of noqa parsing, including some new tests related to the examples I gave above, is immortalized in the first commit cadeb2d . You can probably directly compare those snapshots with the most recent ones, even though the more recent ones also include warnings. If the warnings are noisy you can compare the first commit with the commit 281ec04 which does not yet show warnings.
  2. The core re-implementation is done in 2ef5386 . Since we are mixing re-implementation with refactoring, we temporarily introduce some structs and functionality here that we later get rid of. Specifically: the ParsedNoqa contains a ParsedNoqaDirective which then gets turned into either a Directive (for in-line noqa comments) or a ParsedFileExemption (for file-level exemptions). Later we just keep the notion of a Directive for both in-line and file-level suppression, and remove the other two. Note also that PaseWarning is unused in this commit.
  3. Warnings are emitted in 5c7cdfe (which is what finally settles Warn on invalid # noqa rule codes #12811).
  4. The clippy commit ffb0e96 was actually a bit bigger than I anticipated - I should have been running it throughout! - so it's probably best to hold off on more specific coding style comments until you get to the end.

Miscellaneous comments

  • In general I'm not the happiest with the complexity of the implementation, so don't be afraid to suggest ripping it out and starting fresh if you have a rough idea for a different approach.
  • Probably I should replace all instances of parse with lex since this looks like a regular grammar to me... if someone feels strongly I can do that.
  • If folks think there are natural boundaries to do so, I'm happy to break this up into smaller PRs.
  • Maybe @InSyncWithFoo and @koxudaxi would be interested in looking over at least the design spec here, given Avoid treating lowercase letters as # noqa codes #14229 (comment)?

@dylwil3 dylwil3 added breaking Breaking API change suppression Related to supression of violations e.g. noqa do-not-merge Do not merge this pull request labels Mar 3, 2025
@dylwil3 dylwil3 added this to the v0.10 milestone Mar 3, 2025
Copy link

codspeed-hq bot commented Mar 3, 2025

CodSpeed Performance Report

Merging #16483 will not alter performance

Comparing dylwil3:noqa-parsing (2165d44) with main (37fbe58)

Summary

✅ 32 untouched benchmarks

@dylwil3
Copy link
Collaborator Author

dylwil3 commented Mar 3, 2025

Oh jeez! Well that performance regression is no good. I'd still welcome thoughts on the "design spec" and I'll work on improving the implementation in the meantime. Converting to draft for now...

@dylwil3 dylwil3 marked this pull request as draft March 3, 2025 21:41
Copy link
Contributor

github-actions bot commented Mar 3, 2025

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+0 -66 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

qdrant/qdrant-client (+0 -66 violations, +0 -0 fixes)

- qdrant_client/http/api/aliases_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/aliases_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/aliases_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/beta_api.py:51:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/beta_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/beta_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/collections_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/collections_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/collections_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/distributed_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/distributed_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/distributed_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:126:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:144:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:162:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:180:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:59:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/indexes_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:94:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:158:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:192:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:226:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:356:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:424:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:458:19: F405 `WriteOrdering` may be undefined, or defined from star imports
... 32 additional changes omitted for rule F405
- qdrant_client/http/api/points_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/points_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/search_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/search_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/service_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/service_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/snapshots_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/snapshots_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
... 31 additional changes omitted for project

Changes by rule (3 rules affected)

code total + violation - violation + fix - fix
F405 48 0 48 0 0
F811 9 0 9 0 0
F403 9 0 9 0 0

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -66 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

qdrant/qdrant-client (+0 -66 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

- qdrant_client/http/api/aliases_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/aliases_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/aliases_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/beta_api.py:51:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/beta_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/beta_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/collections_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/collections_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/collections_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/distributed_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/distributed_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/distributed_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:126:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:144:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:162:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:180:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:59:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/indexes_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:94:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:158:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:192:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:226:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:356:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:424:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:458:19: F405 `WriteOrdering` may be undefined, or defined from star imports
... 32 additional changes omitted for rule F405
- qdrant_client/http/api/points_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/points_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/search_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/search_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/service_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/service_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/snapshots_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/snapshots_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
... 31 additional changes omitted for project

Changes by rule (3 rules affected)

code total + violation - violation + fix - fix
F405 48 0 48 0 0
F811 9 0 9 0 0
F403 9 0 9 0 0

@InSyncWithFoo
Copy link
Contributor

InSyncWithFoo commented Mar 3, 2025

@dylwil3 \r is actually not allowed in a comment, since it signifies the end of that physical line (same as \n and \r\n) and thus also the end of the comment.

Additionally, the original implementation uses char.is_whitespace(), which allows non-ASCII whitespace characters as well. In #12809, I argued that only tabs and spaces should be allowed (and I would do so again), but Charlie thought Ruff should be more permissive in that regard.

@dylwil3
Copy link
Collaborator Author

dylwil3 commented Mar 3, 2025

@dylwil3 \r is actually not allowed in a comment, since it signifies the end of that physical line (same as \n and \r\n) and thus also the end of the comment.

Additionally, the original implementation uses char.is_whitespace(), which allows non-ASCII whitespace characters as well. In #12809, I argued that only tabs and spaces should be allowed (and I would do so again), but Charlie thought Ruff should be more permissive in that regard.

Yeah, I tried removing disallowed whitespace as well and I don't think it makes a difference. I think the regex is too heavy to compete with the manual approach for this simple of a pattern.

I sort of agree regarding ascii whitespace, and I do like that the regex is a little more self documenting... I'm torn.

I'll try re-implementing the manual approach tomorrow and see if it's worth it - I can always mention the regex we're implementing in the doc-comment even if it's not used directly.

@dylwil3
Copy link
Collaborator Author

dylwil3 commented Mar 4, 2025

Okay, performance fixed by reverting to manually implementing regex (now at least the file exemption and in-line cases are handled in visibly the same way).

I think the ecosystem check is actually correct, and is one of the changes in behavior consistent with the design: We are now allowing comments after a file-level exemption like: # ruff: noqa This is a bad file don't lint it. Therefore, in the same way that # noqa F401 is a probable typo that suppresses all rules, so too does # ruff: noqa F401 suppress all rules. This mistake is caught by blanket-noqa (PGH004) (in preview).

As an aside: It'd be nice if there were a not-too-complex way to alert users to the footgun # noqa F401 without them turning on PGH004 (since it's not a default rule). Maybe RUF100 could handle this (not a default but I think somewhat more popular)?

@dylwil3 dylwil3 marked this pull request as ready for review March 4, 2025 14:05
comment_range: TextRange,
source: &str,
) -> Result<Option<ParsedNoqa<'_>>, ParseError> {
let line = &source[comment_range];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I review this in detail. Have you considered using Cursor. I find that it can help simplify a lot of those basic if it's this character, eat it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea! I'll check it out. Feel free to review the design (e.g. the tests and snapshots) independently of the implementation, since the former may result in changes in the latter anyway. You can also hold off on both while I look into Cursor - whatever is most convenient!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll wait. Reviewing PRs of this size take a lot of time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can do specific parts if you have something that you'd like to get early feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking API change do-not-merge Do not merge this pull request suppression Related to supression of violations e.g. noqa
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unused-noqa (RUF100) - false negatives and strange behavior with multiple codes
3 participants