Fuzzy name matching #21

art-w · 2024-02-06T11:45:58Z

Sherlodoc uses a compressed suffix tree to index the value names and types. However the search doesn't try to correct user typos even though it would be efficient to do so on this index datastructure. (for example the query flter yields no results)

The search procedure happens in Db.String_automata.find and could return a list of subtrees to tolerate user typos (e.g. a missing character, a character replaced by another, or a character to remove)
Some care is required to ensure the correction produces understandable results... I would probably start with tolerating exactly one typo, on words of sufficient length, to avoid the typo correction being too aggressive and refine this strategy with manual testing :)
The Query.Name_cost would likely need adjustments to detect typo-corrected matches (but it should work even without touching this, as it'll assume that the typo-corrected word was found in the documentation comment, which introduces a penalty which will naturally push the result below exact matches with no typo correction)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuzzy name matching #21

Fuzzy name matching #21

art-w commented Feb 6, 2024

Fuzzy name matching #21

Fuzzy name matching #21

Comments

art-w commented Feb 6, 2024