Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Ability to augment private suffix section #2172

Open
samczsun opened this issue Oct 2, 2024 · 2 comments
Open

[Feature Request] Ability to augment private suffix section #2172

samczsun opened this issue Oct 2, 2024 · 2 comments

Comments

@samczsun
Copy link

samczsun commented Oct 2, 2024

First of all, thank you for this library, it's been a breath of fresh air over psl.

That being said, the PSL is (and likely will always be) incomplete and as a third party, it's not possible to add entries due to their policy of only allowing domain owners to submit changes. This means that if anyone is consuming PSL as a heuristic for hostnames containing user-generated content, they will inevitably run into the need to augment the list with additional entries.

I see that this was discussed previously in #1486, but it's been almost 2 years and I wanted to see if you'd be willing to revisit the topic. While it's possible to hack around the library to support this feature, official support would be greatly appreciated for those who are willing to take the performance hit.

The biggest thing that would help is simply publishing the tldts-utils package with exports so people can make use of the PSL parser as well as the trie builder at runtime.

However, some additional minor tweaks would make the developer experience even better:

  • Exporting the suffixLookup function from tldts so that it has proper types (unless I'm doing it wrong). import suffixLookup from 'tldts/dist/types/src/suffix-trie' doesn't seem to work while import suffixLookup from 'tldts/dist/es6/src/suffix-trie' doesn't come with typings
  • Exporting the FLAG enum so that it can be used with parseImpl, instead of hardcoding the constant
@remusao
Copy link
Owner

remusao commented Oct 3, 2024

Hi @samczsun,

Thanks for reaching out. Let's discuss the options to extend the built-in list. Firstly I am assuming that we only want to add new suffixes and not delete existing ones.

Let's focus on tldts (the main package) and ignore tldts-experimental for now, if the solution we find works for both it's even better but given they have very different internal representations, maybe it's best to focus our efforts on the most used entrypoint.

What do you think about having a new function in the public API, that would allow to patch the internal data-structure with new suffixes. For example:

import { patchTldtsWithNewSuffixes } from 'tldts';

patchTldtsWithNewSuffixes([
  'foo.bar.baz',
  'bar.baz',
  ...
]);

This function would need to be invoked only once and would update the internal representation of tldts (and the operation would be idempotent so calling it multiple times would be safe as well). We might need a slightly more complicated API if we want to make a distinction between ICANN/PRIVATE sections, etc. but we can figure this out if the approach seems reasonable enough.

I did not think too much about the implementation details but I think it should be possible to make it work. The benefit is that there would be no overhead with the new suffixes.

Best,

@samczsun
Copy link
Author

samczsun commented Oct 6, 2024

That would be amazing! Although the perfectionist in me yearns for completeness I think realistically it's unlikely removing suffixes will be needed. As for distinguishing between ICANN vs private, it seems safe to assume that the PSL maintainers will be on top of ICANN changes so we can just assume all changes will be for the private section.

It might be nice to have a way to "reset" the internal state (given the API will be idempotent, it would be necessary to have a way to undo a custom suffix without restarting the process). It would also be nice for the API to support wildcard suffixes in the same way the current PSL does (this is why originally I asked about exporting the parser: it felt appropriate to store our custom suffixes in the same format)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants