Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag to use index #61

Closed
JohnGiorgi opened this issue Mar 11, 2021 · 7 comments
Closed

Flag to use index #61

JohnGiorgi opened this issue Mar 11, 2021 · 7 comments

Comments

@JohnGiorgi
Copy link
Collaborator

Each query should allow for a flag (say use_index) that allows the user to not use/not use an index.

@jvwong
Copy link
Member

jvwong commented Jun 18, 2021

@JohnGiorgi Do you think it's feasible/trivial to handle the following scenario: A request is made, having any of the existing required information (uid, text), however the response is restricted to the uids in the original request under documents. In particular, the index can be updated and queried as usual.

In my mind, this just means picking off the uids in the document request and sending those back, but I'm not sure if all docs are ranked?

Refs #81 (comment)

@JohnGiorgi
Copy link
Collaborator Author

JohnGiorgi commented Jun 19, 2021

I think I understand. Is this the exact same idea as #81 (comment), where there's some parameter passed in with a request, e.g. use_index, and if it's False, we only consider the uids in documents?

Alternatively, we could assume that if documents is not empty, we should only return uids from it and not the index.

I think I could implement either pretty quickly.

@jvwong
Copy link
Member

jvwong commented Jun 21, 2021

I think I understand. Is this the exact same idea as #81 (comment), where there's some parameter passed in with a request, e.g. use_index, and if it's False, we only consider the uids in documents?

Alternatively, we could assume that if documents is not empty, we should only return uids from it and not the index.

Yeah, the key point is that the set of uids provided get returned to the client. It could still use the index.

I think I could implement either pretty quickly.

Sure that would be helpful, let me know if I can help.

@jvwong
Copy link
Member

jvwong commented Dec 13, 2021

I'll take a stab at this - To test I guess we would need some mock/spy ?

@JohnGiorgi
Copy link
Collaborator Author

Right so I think you would basically add some argument to Search that is considered by the search function of main.py.

Then, you could copy and modify dummy_request_with_test() in conftest.py, and add a test for it in test_main.py to ensure this new argument is doing what it is supposed to.

@jvwong
Copy link
Member

jvwong commented Dec 13, 2021

Perhaps this wasn't the issue I thought it was. What I was interested in was this scenario: You send the search say, 10 'document' uids, and you want to rank those documents and only those documents, in the result.

I think many users have asked for this 'subset' search (here, here, here] and I think the answer is just use top_k to get all the results and filter post-hoc - does this make sense?

@JohnGiorgi
Copy link
Collaborator Author

JohnGiorgi commented Dec 13, 2021

Ah right, I forgot FAISS doesn't let you filter. Yes, so you could perform the search on the whole index, then just filter the results so that they only contain the UIDs from documents in the query.

You could add a new argument, or just overload top_k. E.g. if top_k is None, you do the above^. You could even make top_k = None the default setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants