Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy name matching #21

Open
art-w opened this issue Feb 6, 2024 · 0 comments
Open

Fuzzy name matching #21

art-w opened this issue Feb 6, 2024 · 0 comments

Comments

@art-w
Copy link
Owner

art-w commented Feb 6, 2024

Sherlodoc uses a compressed suffix tree to index the value names and types. However the search doesn't try to correct user typos even though it would be efficient to do so on this index datastructure. (for example the query flter yields no results)

  • The search procedure happens in Db.String_automata.find and could return a list of subtrees to tolerate user typos (e.g. a missing character, a character replaced by another, or a character to remove)
  • Some care is required to ensure the correction produces understandable results... I would probably start with tolerating exactly one typo, on words of sufficient length, to avoid the typo correction being too aggressive and refine this strategy with manual testing :)
  • The Query.Name_cost would likely need adjustments to detect typo-corrected matches (but it should work even without touching this, as it'll assume that the typo-corrected word was found in the documentation comment, which introduces a penalty which will naturally push the result below exact matches with no typo correction)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant