-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve DX with babel-plugin-transform-regex
#704
Comments
Thanks for the idea! I am not sure how to proceed. I can see the advantages, but also some disadvantages.
|
Those are all reasonable, but I'd make a few points in response.
IMO this is minor compared to what you're currently expecting of developers who want to fix or improve the regexes, given the current, more-or-less unmaintainable, giant regexes that offer no accompanying guidance or features that make them self-documenting.
Good point, but it can be made better. I'm envisioning an intermediate step that does this transform before running tests, and before additional bundling/minification/etc. More complex solutions are also possible, like importing
Tests can be made to work as noted above. But unmaintainable regexes that few understand or can audit offer their own form of security concern, that tests are not a full solution for. The emoji regex fix in #666 offers an illustrative example. Valibot had tests for it, but didn't think to test that e.g. it should not match plain digits or Unicode format/modifier characters. This stemmed from how it wasn't easy to understand exactly what the regex did and did not match (although in this particular case it was also about understanding the complexity of emoji). In any case, no worries at all if you don't think it's the right fit! Feel free to close this unless you're looking for more input. |
Another example for the fun of it... Valibot's
Here is the same regex if we used the regex('i')`
^( \g<ipv4> | \g<ipv6> )$
# Define subpatterns for use above
( (?<ipv4> \g<byte> ( \. \g<byte>){3} )
(?<ipv6>
(\g<hex4>:){7} \g<hex4>
| (\g<hex4>:){1,7} :
| (\g<hex4>:){1,6} :\g<hex4>
| (\g<hex4>:){1,5} (:\g<hex4>){1,2}
| (\g<hex4>:){1,4} (:\g<hex4>){1,3}
| (\g<hex4>:){1,3} (:\g<hex4>){1,4}
| (\g<hex4>:){1,2} (:\g<hex4>){1,5}
| \g<hex4>: (:\g<hex4>){1,6}
| : ( (:\g<hex4>){1,7} | : )
# With zone identifier
| fe80: (:[\da-f]{0,4}){0,4} % [\da-z]+
# Mixed addresses
| :: ( ffff ( :0{1,4} )? : )? \g<ipv4>
| (\g<hex4>:){1,4} : \g<ipv4>
)
(?<hex4> [\da-f]{1,4} )
(?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d )
){0}
`; Okay, that's still quite complicated, but the structure is much clearer. This can be edited by mortals, and those edits can be reviewed by mortals. And what do you know... by making this readable, I can now spot bugs. First, it fails to match some valid mixed IPv6/IPv4 addresses. For example, it doesn't match any of the following valid addresses:
It also thinks the following address is valid, which it is not (which it self-acknowledges by not matching it when the zone identifier isn't present):
Good luck to anyone who wants to fix these bugs in the original version of the regex! Personally, I don't want to touch it. 😖 And I'm in the 99th percentile of developers comfortable with reading and editing complex regexes. Also, since I can now read this regex, I can see that there are ways to simplify it. I'll avoid doing that here though since I'm already quite off topic. (Edit: Collecting related existing issues/PRs here since I'd be happy to come back to fixing this regex in the future: #206, #369.) |
Thanks for the detailed answers. I like the benefits and if you have the time, feel free to take care of the setup and create a PR. You can also create a draft PR to get early feedback on any questions you may have. Would you be interested in becoming Valibot's Head of Regex? 😎 If you are interested, I would contact you to request a review whenever we add or change a regex. |
Thanks for the offer and vote of confidence. I would probably say yes if I was an active Valibot user, but since I'm not, I'm not currently able to make the commitment. Feel free to contact me though if you have more specific regex questions.
Cool. 😎 If you're okay with leaving this open for a while, I'll try to come back to it and figure out a reasonable setup, probably after first putting better testing in place for the Babel plugin. |
Yes, no rush! Thanks for your contributions! 🙏 |
Thoughts on adding babel-plugin-transform-regex to
devDependencies
?This would allow changing e.g. the unreadable/unmaintainable
IPV4_REGEX
:/^(?:(?:[1-9]|1\d|2[0-4])?\d|25[0-5])(?:\.(?:(?:[1-9]|1\d|2[0-4])?\d|25[0-5])){3}$/u
To the much nicer:
This would then get transpiled into a native regex (you can try it here), without the need for any runtime dependency, any added run-time cost for users, or any interpolation when constructing the regex (which could cause issues with tree-shaking).
This is just an example, and some of the other regexes in
library/src/regex.ts
could benefit more significantly.(This is a follow up from #666, where interpolation was removed from regex changes due to tree-shaking concerns.)
The text was updated successfully, but these errors were encountered: