Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: consider splitting dictionary in multiple ones #597

Open
ccoVeille opened this issue Feb 5, 2025 · 4 comments
Open

Feature Request: consider splitting dictionary in multiple ones #597

ccoVeille opened this issue Feb 5, 2025 · 4 comments

Comments

@ccoVeille
Copy link
Contributor

I like how @Jason3S (hi Jason 👋) structured the cspell project

https://github.com/streetsidesoftware/cspell-dicts

Here I'm not asking for a distinct repository as cspell does, but at least to split a dictionary that went too big

@hippietrail
Copy link
Contributor

Regarding our recent chat on Discord, here's the thread where I posted some related thoughts a few days ago: #473

@Jason3S
Copy link

Jason3S commented Feb 5, 2025

@hippietrail, @elijah-potter

Cool project!

@ccoVeille
Copy link
Contributor Author

Here is a thread about "should we accept min as a word"

Here we are reaching the logic cspell project faced and why they split their dictionaries, and why I suggested the idea for this project too

A developer would like to have min, while a non-technical writer won't.

Please take a look at how cspell-dicts is organized, and how they can be enabled, disabled in cspell.

Also, cspell-dicts are almost all (maybe all?) with MIT license, so they could be reused by project like yours

Originally posted by @ccoVeille in #596 (comment)

So the idea could be not only to split the dictionary, but also consider external source of data to feed harper dictionaries

@hippietrail
Copy link
Contributor

Here is a thread about "should we accept min as a word"

Here we are reaching the logic cspell project faced and why they split their dictionaries, and why I suggested the idea for this project too
A developer would like to have min, while a non-technical writer won't.
Please take a look at how cspell-dicts is organized, and how they can be enabled, disabled in cspell.
Also, cspell-dicts are almost all (maybe all?) with MIT license, so they could be reused by project like yours

Originally posted by @ccoVeille in #596 (comment)

So the idea could be not only to split the dictionary, but also consider external source of data to feed harper dictionaries

There are going to be lots of identifiers in comments, and also technical terms for things that are not really English words, or don't have settled spellings, etc. think "hash map" vs "hashmap" vs "HashMap" vs "hash_map" etc. not to mention things like printf, stdout, const, etc.

On the one hand we don't want to mark them all as possible errors, and on the other hand we don't want to pollute the English dictionary with lots of that stuff.

Completely separately but related, any class/object/interface/variable/constant/function etc. name will be referenced in comments. So ideally in the future a way to gather all those from the current file or the project/repo into an additional dynamic dictionary would be something to have. But it may also be worth finding English mistakes inside identifiers too - I know I've contributed fixes for that sort of thing to OSS projects before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants