You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am unable to use relaxed query matching such as wildcards, regular expressions, and fuzzy matching against indexed fields and inside their related query operators such as phrase/near/onear.
Describe the solution you'd like
Add support for relaxed query terms against indexed fields.
Ideally support analysis where possible (wildcard terms can be lowercased, have character normalization, etc).
Add the ability to set the max number of term expansions where hitting that limit will be an error or a signal to stop expanding.
Be smart about duplicated relaxed terms, for example ((a* NEAR b) OR (a* ONEAR c)) will only perform the expensive term dictionary scan for a* once.
Describe alternatives you've considered
Using attributes which are in-memory and don't work with phrase/near/onear.
Doing the expansion outside of the engine or in a query component.
Additional context
Wildcards and proximity operations against large free text fields is a very popular scenario in enterprise search use-cases.
The text was updated successfully, but these errors were encountered:
We have a similar situation where fuzzy matching an (array<string>) indexed field in streaming mode would be very convenient.
If we just try fuzzy matching we can see that
"FUZZY(waste management,1,0,false) toc_label:waste management field is not a string attribute"
So then we could try using gram matching, however
n-gram matching is not supported for streaming search
We could try substring/prefix matching, which is a slight improvement but still doesn't handle typos.
So then our only other option currently is a synthetic string attribute field stored outside the document:
field myStringArrayAttribute type array<string> {
indexing: input myStringArray | attribute
}
But then the string field would be stored in memory, significantly increasing memory resources and defeating the point of using streaming mode. Is that understanding correct?
It would be great if Vespa could support an option to help us in this situation:
Is your feature request related to a problem? Please describe.
I am unable to use relaxed query matching such as wildcards, regular expressions, and fuzzy matching against indexed fields and inside their related query operators such as phrase/near/onear.
Describe the solution you'd like
((a* NEAR b) OR (a* ONEAR c))
will only perform the expensive term dictionary scan fora*
once.Describe alternatives you've considered
Additional context
Wildcards and proximity operations against large free text fields is a very popular scenario in enterprise search use-cases.
The text was updated successfully, but these errors were encountered: