-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiline regex in text filtering (#2857) #2889
base: master
Are you sure you want to change the base?
Conversation
else: | ||
ignore_text.append(k.strip()) | ||
|
||
for line in content.splitlines(keepends=True): | ||
i += 1 | ||
for r in ignore_regex_multiline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the approach is here so i dont have to manually compile it in my brain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It finds all matches over the whole content using finditer, without splitting it into lines.
For each match it calculates the start/end line number by using splitlines on part of the content before the match end, and the number of lines in the matched content.
Instead of accumulating output lines to an array, the line numbers are added to ignored_lines array which is then converted to a set to remove duplicates. This way multiple filtering methods can mark lines as ignored, and the output is constructed as the last step.
@@ -42,6 +41,46 @@ def test_strip_regex_text_func(): | |||
# Check line number reporting | |||
stripped_content = html_tools.strip_ignore_text(test_content, ignore_lines, mode="line numbers") | |||
assert stripped_content == [2, 5, 6, 7, 8, 10] | |||
|
|||
stripped_content = html_tools.strip_ignore_text(test_content, ['/but 1.+5 lines/s']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bumping the tests!
Fixes #2857.
Hopefully this does not break some obscure edge case.