Make noqa parsing consistent and more robust #16483

dylwil3 · 2025-03-03T21:31:39Z

Summary

The goal of this PR is to address various issues around parsing suppression comments by

Unifying the logic used to parse in-line (# noqa) and file-level (# ruff: noqa) noqa comments
Recovering from certain errors and surfacing warnings in these cases

Closes #15682
Supersedes #12811
Addresses #14229 (comment)
Related: #14229 , #12809

Current suppression behavior

The implementations for parsing in-line and file-level noqa comments are entirely separate, so a number of discrepancies have arisen. You can unfurl some examples of this below.

Examples of discrepancies

Leading Hashes

This works:

from typing import List ## noqa: UP035

This does not, and emits no warning:

from typing import List
## ruff: noqa: UP035

Missing colon

This silently ignores all rules on the line, even though
the user intended a single rule to be ignored:

from typing import List # noqa UP035

This prints a warning and does not suppress anything:

from typing import List 
# ruff: noqa UP035

Missing Delimiter

This works to suppress both violations:

from typing import List # noqa: UP035F401

This warns and suppresses neither:

from typing import List 
#ruff: noqa: UP035F401

Multiple Commas

This works to suppress both violations:

from typing import List #noqa: UP035,,,F401

This suppresses the first but not the second.
It does not emit a warning:

from typing import List 
#ruff:noqa: UP035,,,F401

Missing whitespace before additional comments

This works:

from typing import List #noqa: UP035abc

This warns and does nothing:

from typing import List 
#ruff:noqa: UP035abc

Additional comments after noqa

This works:

from typing import List #noqa: UP035 and more comments

And so does this:

from typing import List #noqa and more comments

And so does this:

from typing import List 
#ruff: noqa: UP035 and more comments

But this emits a warning and suppresses nothing:

from typing import List 
#ruff:noqa and more comments

Design

The specification for the unified parsing logic is as follows:

A suppression comment must be contained in a comment range and begins with a prefix matching one of the following regexes:

(file level) #\s*(?:ruff|flake8)\s*:\s*(?i)noqa
(inline) #\s*(?i)noqa

If the character following the noqa is '#', whitespace, or if there are no characters following, then the suppression comment suppressed all rules. If the character following is ":", proceed to (3). Otherwise, warn the user for InvalidSuffix.
We expect a list of rule codes, separated by commas or whitespace. After splitting first on commas, and then on whitespace, we attempt to parse each item in turn. If parsing a nonempty item returns no rule codes, we stop. If no codes were found at all we warn the user with MissingCodes.
A rule code is a match for the regex [A-Z]+[0-9]+. While it is expected that there is one rule code per item, parsing continues even if that is not true, (e.g., F401F841 or F401,,F841), and the user is warned of a MissingDelimiter or MissingItem. If an item does not consist entirely of codes (e.g. F401abc), parsing aborts with an error and the user is warned of an InvalidCodeSuffix - the exception is if we encounter #, in which case parsing stops and returns all codes collected to that point (e.g. # noqa: F401#comment).

Implementation

Originally this was implemented closely following the above description, using split(',') and split_ascii_whitespace and iterating along. However, a Code consists of both a string slice and a range in the source code, and it proved too difficult (for me!) to keep track of the byte offsets with this approach. So we adopted a more manual approach encapsulated in NoqaParser, with logic beginning in NoqaParser::parse.

A ParsedNoqa consists of a list of warnings, which are non-fatal syntax errors encountered during parsing (namely MissingItem and MissingDelimiter) and the found Directive (which is either All or the list of codes and their ranges in the source text). One is really only expected to use parse_inline_noqa and parse_file_exemption, which are module-level methods that use NoqaParser internally. This convention could be enforced if we wanted to move NoqaParser to another submodule (though we would also need to expose a wrapper around NoqaParser::parse_code because that is used in the implementation of blanket-noqa (PGH004)).

Other changes and affected rules/behavior

The only rules that appear to be affected are blanket-noqa (PGH004)) and unused-noqa (RUF100) (but we'll see if others come up in the ecosystem check). For example, we no longer parse the first code in # noqa F401.F841 and instead emit a warning, which changes one of the snapshots for RUF100.

Internally, we no longer distinguish between a Directive and a ParsedFileExemption (keeping just the former), since these were used almost identically. They are carried around inside of NoqaDirectiveLine and FileNoqaDirectedLine, respectively, so there is little risk in misunderstanding the context if we just use Directive in both cases.

Review

It may not quite be easiest to review commit-by-commit, but let me give a quick summary of the git log and you can choose your own adventure if it is helpful:

The current behavior of noqa parsing, including some new tests related to the examples I gave above, is immortalized in the first commit cadeb2d . You can probably directly compare those snapshots with the most recent ones, even though the more recent ones also include warnings. If the warnings are noisy you can compare the first commit with the commit 281ec04 which does not yet show warnings.
The core re-implementation is done in 2ef5386 . Since we are mixing re-implementation with refactoring, we temporarily introduce some structs and functionality here that we later get rid of. Specifically: the ParsedNoqa contains a ParsedNoqaDirective which then gets turned into either a Directive (for in-line noqa comments) or a ParsedFileExemption (for file-level exemptions). Later we just keep the notion of a Directive for both in-line and file-level suppression, and remove the other two. Note also that PaseWarning is unused in this commit.
Warnings are emitted in 5c7cdfe (which is what finally settles Warn on invalid # noqa rule codes #12811).
The clippy commit ffb0e96 was actually a bit bigger than I anticipated - I should have been running it throughout! - so it's probably best to hold off on more specific coding style comments until you get to the end.

Miscellaneous comments

In general I'm not the happiest with the complexity of the implementation, so don't be afraid to suggest ripping it out and starting fresh if you have a rough idea for a different approach.
Probably I should replace all instances of parse with lex since this looks like a regular grammar to me... if someone feels strongly I can do that.
If folks think there are natural boundaries to do so, I'm happy to break this up into smaller PRs.
Maybe @InSyncWithFoo and @koxudaxi would be interested in looking over at least the design spec here, given Avoid treating lowercase letters as # noqa codes #14229 (comment)?

codspeed-hq · 2025-03-03T21:36:53Z

CodSpeed Performance Report

Merging #16483 will not alter performance

_{Comparing dylwil3:noqa-parsing (2165d44) with main (37fbe58)}

Summary

✅ 32 untouched benchmarks

dylwil3 · 2025-03-03T21:38:49Z

Oh jeez! Well that performance regression is no good. I'd still welcome thoughts on the "design spec" and I'll work on improving the implementation in the meantime. Converting to draft for now...

github-actions · 2025-03-03T21:42:10Z

`ruff-ecosystem` results

Linter (stable)

ℹ️ ecosystem check detected linter changes. (+0 -66 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

qdrant/qdrant-client (+0 -66 violations, +0 -0 fixes)

- qdrant_client/http/api/aliases_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/aliases_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/aliases_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/beta_api.py:51:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/beta_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/beta_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/collections_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/collections_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/collections_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/distributed_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/distributed_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/distributed_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:126:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:144:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:162:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:180:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:59:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/indexes_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:94:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:158:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:192:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:226:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:356:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:424:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:458:19: F405 `WriteOrdering` may be undefined, or defined from star imports
... 32 additional changes omitted for rule F405
- qdrant_client/http/api/points_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/points_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/search_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/search_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/service_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/service_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/snapshots_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/snapshots_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
... 31 additional changes omitted for project

Changes by rule (3 rules affected)

code	total	- violation
F405	48	48
F811	9	9
F403	9	9

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -66 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)

qdrant/qdrant-client (+0 -66 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

- qdrant_client/http/api/aliases_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/aliases_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/aliases_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/beta_api.py:51:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/beta_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/beta_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/collections_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/collections_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/collections_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/distributed_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/distributed_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/distributed_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:126:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:144:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:162:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:180:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:59:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/indexes_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/indexes_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/indexes_api.py:94:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:158:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:192:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:226:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:356:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:424:19: F405 `WriteOrdering` may be undefined, or defined from star imports
- qdrant_client/http/api/points_api.py:458:19: F405 `WriteOrdering` may be undefined, or defined from star imports
... 32 additional changes omitted for rule F405
- qdrant_client/http/api/points_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/points_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/search_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/search_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/service_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/service_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
- qdrant_client/http/api/snapshots_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4
- qdrant_client/http/api/snapshots_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names
... 31 additional changes omitted for project

Changes by rule (3 rules affected)

code	total	- violation
F405	48	48
F811	9	9
F403	9	9

Turns out non-ascii whitespace is allowed anyway! This reverts commit 367d3a6.

InSyncWithFoo · 2025-03-03T23:25:18Z

@dylwil3 \r is actually not allowed in a comment, since it signifies the end of that physical line (same as \n and \r\n) and thus also the end of the comment.

Additionally, the original implementation uses char.is_whitespace(), which allows non-ASCII whitespace characters as well. In #12809, I argued that only tabs and spaces should be allowed (and I would do so again), but Charlie thought Ruff should be more permissive in that regard.

dylwil3 · 2025-03-03T23:45:59Z

@dylwil3 \r is actually not allowed in a comment, since it signifies the end of that physical line (same as \n and \r\n) and thus also the end of the comment.

Additionally, the original implementation uses char.is_whitespace(), which allows non-ASCII whitespace characters as well. In #12809, I argued that only tabs and spaces should be allowed (and I would do so again), but Charlie thought Ruff should be more permissive in that regard.

Yeah, I tried removing disallowed whitespace as well and I don't think it makes a difference. I think the regex is too heavy to compete with the manual approach for this simple of a pattern.

I sort of agree regarding ascii whitespace, and I do like that the regex is a little more self documenting... I'm torn.

I'll try re-implementing the manual approach tomorrow and see if it's worth it - I can always mention the regex we're implementing in the doc-comment even if it's not used directly.

dylwil3 · 2025-03-04T14:05:53Z

Okay, performance fixed by reverting to manually implementing regex (now at least the file exemption and in-line cases are handled in visibly the same way).

I think the ecosystem check is actually correct, and is one of the changes in behavior consistent with the design: We are now allowing comments after a file-level exemption like: # ruff: noqa This is a bad file don't lint it. Therefore, in the same way that # noqa F401 is a probable typo that suppresses all rules, so too does # ruff: noqa F401 suppress all rules. This mistake is caught by blanket-noqa (PGH004) (in preview).

As an aside: It'd be nice if there were a not-too-complex way to alert users to the footgun # noqa F401 without them turning on PGH004 (since it's not a default rule). Maybe RUF100 could handle this (not a default but I think somewhat more popular)?

MichaReiser · 2025-03-04T14:32:42Z

crates/ruff_linter/src/noqa.rs

+    comment_range: TextRange,
+    source: &str,
+) -> Result<Option<ParsedNoqa<'_>>, ParseError> {
+    let line = &source[comment_range];


Before I review this in detail. Have you considered using Cursor. I find that it can help simplify a lot of those basic if it's this character, eat it

Sounds like a good idea! I'll check it out. Feel free to review the design (e.g. the tests and snapshots) independently of the implementation, since the former may result in changes in the latter anyway. You can also hold off on both while I look into Cursor - whatever is most convenient!

I think I'll wait. Reviewing PRs of this size take a lot of time

But I can do specific parts if you have something that you'd like to get early feedback

dylwil3 added 12 commits March 3, 2025 07:24

tests and snapshots to capture current behavior

cadeb2d

match function signatures for directive and file exemption

8e0c806

test that code text matches slice at range

7d09684

core parsing reimplementation

2ef5386

update snapshots for parsing behavior

281ec04

update noqa.py fixture and snapshot for new behavior

93c570f

emit warnings for inline and file-level exemptions

5c7cdfe

include warnings in snapshot tests

fe97252

remove ParsedFileExemption

f17fbd3

remove Directive::lex_code

0f8c38a

replace ParsedNoqaDirective with Directive everywhere

41e10e1

clippy

ffb0e96

dylwil3 added breaking Breaking API change suppression Related to supression of violations e.g. noqa do-not-merge Do not merge this pull request labels Mar 3, 2025

dylwil3 added this to the v0.10 milestone Mar 3, 2025

dylwil3 marked this pull request as draft March 3, 2025 21:41

dylwil3 added 3 commits March 3, 2025 15:52

ascii whitespace only in regex

367d3a6

revert "ascii whitespace only in regex" and try again

0e865f5

Turns out non-ascii whitespace is allowed anyway! This reverts commit 367d3a6.

try explicit case-insensitive

5488402

dylwil3 added 2 commits March 4, 2025 07:30

hand roll noqa prefix regex

663198f

hand roll file exemption prefix regex

2165d44

dylwil3 marked this pull request as ready for review March 4, 2025 14:05

MichaReiser reviewed Mar 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make noqa parsing consistent and more robust #16483

Make noqa parsing consistent and more robust #16483

dylwil3 commented Mar 3, 2025

codspeed-hq bot commented Mar 3, 2025 •

edited

Loading

dylwil3 commented Mar 3, 2025 •

edited

Loading

github-actions bot commented Mar 3, 2025 •

edited

Loading

InSyncWithFoo commented Mar 3, 2025 •

edited

Loading

dylwil3 commented Mar 3, 2025

dylwil3 commented Mar 4, 2025 •

edited

Loading

MichaReiser Mar 4, 2025

dylwil3 Mar 4, 2025

MichaReiser Mar 4, 2025

MichaReiser Mar 4, 2025

Make noqa parsing consistent and more robust #16483

Are you sure you want to change the base?

Make noqa parsing consistent and more robust #16483

Conversation

dylwil3 commented Mar 3, 2025

Summary

Current suppression behavior

Leading Hashes

Missing colon

Missing Delimiter

Multiple Commas

Missing whitespace before additional comments

Additional comments after noqa

Design

Implementation

Other changes and affected rules/behavior

Review

Miscellaneous comments

codspeed-hq bot commented Mar 3, 2025 • edited Loading

CodSpeed Performance Report

Merging #16483 will not alter performance

Summary

dylwil3 commented Mar 3, 2025 • edited Loading

github-actions bot commented Mar 3, 2025 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

InSyncWithFoo commented Mar 3, 2025 • edited Loading

dylwil3 commented Mar 3, 2025

dylwil3 commented Mar 4, 2025 • edited Loading

MichaReiser Mar 4, 2025

Choose a reason for hiding this comment

dylwil3 Mar 4, 2025

Choose a reason for hiding this comment

MichaReiser Mar 4, 2025

Choose a reason for hiding this comment

MichaReiser Mar 4, 2025

Choose a reason for hiding this comment

codspeed-hq bot commented Mar 3, 2025 •

edited

Loading

dylwil3 commented Mar 3, 2025 •

edited

Loading

github-actions bot commented Mar 3, 2025 •

edited

Loading

`ruff-ecosystem` results

InSyncWithFoo commented Mar 3, 2025 •

edited

Loading

dylwil3 commented Mar 4, 2025 •

edited

Loading