-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make noqa parsing consistent and more robust #16483
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #16483 will not alter performanceComparing Summary
|
Oh jeez! Well that performance regression is no good. I'd still welcome thoughts on the "design spec" and I'll work on improving the implementation in the meantime. Converting to draft for now... |
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
F405 | 48 | 0 | 48 | 0 | 0 |
F811 | 9 | 0 | 9 | 0 | 0 |
F403 | 9 | 0 | 9 | 0 | 0 |
Linter (preview)
ℹ️ ecosystem check detected linter changes. (+0 -66 violations, +0 -0 fixes in 1 projects; 54 projects unchanged)
qdrant/qdrant-client (+0 -66 violations, +0 -0 fixes)
ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview
- qdrant_client/http/api/aliases_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/aliases_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/aliases_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/beta_api.py:51:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/beta_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/beta_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/collections_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/collections_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/collections_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/distributed_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/distributed_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/distributed_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/indexes_api.py:126:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:144:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:162:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:180:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:52:54: F405 `AsyncApiClient` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:59:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/indexes_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/indexes_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/indexes_api.py:94:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:158:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:192:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:226:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:356:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:424:19: F405 `WriteOrdering` may be undefined, or defined from star imports - qdrant_client/http/api/points_api.py:458:19: F405 `WriteOrdering` may be undefined, or defined from star imports ... 32 additional changes omitted for rule F405 - qdrant_client/http/api/points_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/points_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/search_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/search_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/service_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/service_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names - qdrant_client/http/api/snapshots_api.py:5:27: F811 Redefinition of unused `BaseModel` from line 4 - qdrant_client/http/api/snapshots_api.py:7:1: F403 `from qdrant_client.http.models import *` used; unable to detect undefined names ... 31 additional changes omitted for project
Changes by rule (3 rules affected)
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
F405 | 48 | 0 | 48 | 0 | 0 |
F811 | 9 | 0 | 9 | 0 | 0 |
F403 | 9 | 0 | 9 | 0 | 0 |
Turns out non-ascii whitespace is allowed anyway! This reverts commit 367d3a6.
@dylwil3 Additionally, the original implementation uses |
Yeah, I tried removing disallowed whitespace as well and I don't think it makes a difference. I think the regex is too heavy to compete with the manual approach for this simple of a pattern. I sort of agree regarding ascii whitespace, and I do like that the regex is a little more self documenting... I'm torn. I'll try re-implementing the manual approach tomorrow and see if it's worth it - I can always mention the regex we're implementing in the doc-comment even if it's not used directly. |
Okay, performance fixed by reverting to manually implementing regex (now at least the file exemption and in-line cases are handled in visibly the same way). I think the ecosystem check is actually correct, and is one of the changes in behavior consistent with the design: We are now allowing comments after a file-level exemption like: As an aside: It'd be nice if there were a not-too-complex way to alert users to the footgun |
comment_range: TextRange, | ||
source: &str, | ||
) -> Result<Option<ParsedNoqa<'_>>, ParseError> { | ||
let line = &source[comment_range]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I review this in detail. Have you considered using Cursor
. I find that it can help simplify a lot of those basic if it's this character, eat it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea! I'll check it out. Feel free to review the design (e.g. the tests and snapshots) independently of the implementation, since the former may result in changes in the latter anyway. You can also hold off on both while I look into Cursor
- whatever is most convenient!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll wait. Reviewing PRs of this size take a lot of time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can do specific parts if you have something that you'd like to get early feedback
Summary
The goal of this PR is to address various issues around parsing suppression comments by
# noqa
) and file-level (# ruff: noqa
) noqa commentsCloses #15682
Supersedes #12811
Addresses #14229 (comment)
Related: #14229 , #12809
Current suppression behavior
The implementations for parsing in-line and file-level noqa comments are entirely separate, so a number of discrepancies have arisen. You can unfurl some examples of this below.
Examples of discrepancies
Leading Hashes
This works:
This does not, and emits no warning:
Missing colon
This silently ignores all rules on the line, even though
the user intended a single rule to be ignored:
This prints a warning and does not suppress anything:
Missing Delimiter
This works to suppress both violations:
This warns and suppresses neither:
Multiple Commas
This works to suppress both violations:
This suppresses the first but not the second.
It does not emit a warning:
Missing whitespace before additional comments
This works:
This warns and does nothing:
Additional comments after noqa
This works:
And so does this:
And so does this:
But this emits a warning and suppresses nothing:
Design
The specification for the unified parsing logic is as follows:
#\s*(?:ruff|flake8)\s*:\s*(?i)noqa
#\s*(?i)noqa
noqa
is'#'
, whitespace, or if there are no characters following, then the suppression comment suppressed all rules. If the character following is":"
, proceed to (3). Otherwise, warn the user forInvalidSuffix
.MissingCodes
.[A-Z]+[0-9]+
. While it is expected that there is one rule code per item, parsing continues even if that is not true, (e.g.,F401F841
orF401,,F841
), and the user is warned of aMissingDelimiter
orMissingItem
. If an item does not consist entirely of codes (e.g.F401abc
), parsing aborts with an error and the user is warned of anInvalidCodeSuffix
- the exception is if we encounter#
, in which case parsing stops and returns all codes collected to that point (e.g.# noqa: F401#comment
).Implementation
Originally this was implemented closely following the above description, using
split(',')
andsplit_ascii_whitespace
and iterating along. However, aCode
consists of both a string slice and a range in the source code, and it proved too difficult (for me!) to keep track of the byte offsets with this approach. So we adopted a more manual approach encapsulated inNoqaParser
, with logic beginning inNoqaParser::parse
.A
ParsedNoqa
consists of a list of warnings, which are non-fatal syntax errors encountered during parsing (namelyMissingItem
andMissingDelimiter
) and the foundDirective
(which is eitherAll
or the list of codes and their ranges in the source text). One is really only expected to useparse_inline_noqa
andparse_file_exemption
, which are module-level methods that useNoqaParser
internally. This convention could be enforced if we wanted to moveNoqaParser
to another submodule (though we would also need to expose a wrapper aroundNoqaParser::parse_code
because that is used in the implementation of blanket-noqa (PGH004)).Other changes and affected rules/behavior
The only rules that appear to be affected are blanket-noqa (PGH004)) and unused-noqa (RUF100) (but we'll see if others come up in the ecosystem check). For example, we no longer parse the first code in
# noqa F401.F841
and instead emit a warning, which changes one of the snapshots forRUF100
.Internally, we no longer distinguish between a
Directive
and aParsedFileExemption
(keeping just the former), since these were used almost identically. They are carried around inside ofNoqaDirectiveLine
andFileNoqaDirectedLine
, respectively, so there is little risk in misunderstanding the context if we just useDirective
in both cases.Review
It may not quite be easiest to review commit-by-commit, but let me give a quick summary of the
git log
and you can choose your own adventure if it is helpful:ParsedNoqa
contains aParsedNoqaDirective
which then gets turned into either aDirective
(for in-line noqa comments) or aParsedFileExemption
(for file-level exemptions). Later we just keep the notion of aDirective
for both in-line and file-level suppression, and remove the other two. Note also thatPaseWarning
is unused in this commit.# noqa
rule codes #12811).Miscellaneous comments
parse
withlex
since this looks like a regular grammar to me... if someone feels strongly I can do that.# noqa
codes #14229 (comment)?