Edge Case for json requests #81

Anwesh1 · 2021-04-13T06:01:50Z

Hey @JohnGiorgi, I've been bombarding semantic search with a lot of weird scenarios of inputs. I also sent a request for 30+ documents and it seems to work fine (which is great!!). I'm going to continue pushing weird requests to it but is there any specific thing that I should try instead that you think could break the code?
Let me know, thanks.

JohnGiorgi · 2021-04-13T13:30:25Z

The goal is to try to group your requests to cover different scenarios. Here's a few I can think of:

top_k
- greater than the possible number of document
- less than the possible number of documents
- equal to the possible number of documents
- 0 & less than 0 (is the error message informative?)
- Something that isn't an integer (float, str -- is the error message informative?)
query
- Provide a uid that doesn't exist (is the error message informative?)
- With text and uid
- With uid only
- Identical to one of the documents
documents
- Provide a uid that doesn't exist (is the error message informative?)
- With text and uid
- With uid only
- Identical to the query (it shouldn't appear in the results, is top k handled correctly in this case?)
- Providing no documents at all (I think this is supported, if not we would have to add it first)

I encourage you to be creative and expand this list!

If you are having trouble finding edge cases, you could move on to writing unit tests.

jvwong · 2021-04-14T13:24:00Z

Two questions based on trying a few examples:

What does top k do?
What if any, are the constraints on the POST request if I include text?

(should update README)

JohnGiorgi · 2021-04-14T18:38:41Z

@jvwong Does the readme on develop answer question 1? (see here, "The return value is a JSON representation of the top_k most similar documents (defaults to 10)"

I updated the docstring example in #82 so that it is more informative.

Anwesh1 · 2021-04-14T18:49:33Z

By the way, all the situations you suggested I check out, work. Except for when the uid provided doesn't exist.
This breaks the code producing an Internal Server error, trying to fix that.

JohnGiorgi · 2021-04-14T18:54:39Z

By the way, all the situations you suggested I check out, work. Except for when the uid provided doesn't exist.
This breaks the code producing an Internal Server error, trying to fix that.

Great! Yeah ideally we would catch this and return a more informative status code (maybe a 400 or 422) with a message (the "uid" provided does not appear to be a valid PMID)

jvwong · 2021-04-14T19:10:43Z

@jvwong Does the readme on develop answer question 1? (see here, "The return value is a JSON representation of the top_k most similar documents (defaults to 10)"

I updated the docstring example in #82 so that it is more informative.

I'm not clear on how to interpret the output based on the input:

e.g. I give you 1 query uid, 3 document uids, top_k = 20 --> I get 20 back, but where did the other 17 things come from?

JohnGiorgi · 2021-04-14T19:13:01Z

I'm not clear on how to interpret the output based on the input:
* e.g. I give you 1 query uid, 3 document uids, top_k = 20 --> I get 20 back, but where did the  other 17 things come from?

I see.

The other 17 things come from the index. During every request, we check if the query or documents are in the index. If not, we add them. They will be included in the search during subsequent requests.

I can think about where to explain this, probably in the readme somewhere.

jvwong · 2021-04-14T19:18:11Z

I'm not clear on how to interpret the output based on the input:
* e.g. I give you 1 query uid, 3 document uids, top_k = 20 --> I get 20 back, but where did the  other 17 things come from?
I see.

The other 17 things come from the index. During every request, we check if the query or documents are in the index. If not, we add them. They will be included in the search during subsequent requests.

I can think about where to explain this, probably in the readme somewhere.

I see. So when you provide a POST body with documents, you're basically saying add these to the index?
This sorta breaks from the original idea of: "Rank the provided query against the provided documents".

JohnGiorgi · 2021-04-14T19:21:09Z

I see. So when you provide a POST body with documents, you're basically saying add these to the index?

Yes

This sorta breaks from the original idea of: "Rank the provided query against the provided documents".

I see. Yeah, this is why I was trying to be such a stickler about the API/Schemas/Use cases during all the meetings.

I think it would be easy enough to support the original idea. Probably just a single param like use_index which could be set to False, and that would mean to only compare the query against documents. Already being tracked in #61. Do you want me to add this?

jvwong · 2021-04-14T19:28:29Z

This sorta breaks from the original idea of: "Rank the provided query against the provided documents".

I see. Yeah, this is why I was trying to be such a stickler about the API/Schemas/Use cases during all the meetings.

I think it would be easy enough to support the original idea. Probably just a single param like use_index which could be set to False, and that would mean to only compare the query against documents. Already being tracked in #61. Do you want me to add this?

No don't change anything code-wise, I was just trying to clarify the expected behaviour as it exists. The only ambiguity is what's in the index at the time that a client hits it.

Anwesh1 · 2021-04-14T23:55:29Z

Hey @JohnGiorgi, had a question regarding displaying the message. I pinpointed the location to fix the problem where uid is not valid to this place in the ncbi.py file:

semantic-search/semantic_search/ncbi.py

Lines 101 to 103 in a52492f

    
           if "PMID" not in record: 
        
               logging.warn(f"No PMID for {json.dumps(record)}") 
        
               continue

Essentially this only triggers for us when the PMID is not valid so when the if statement were to trigger, we would stop our iteration and display a message. I was confused as to how you actually display the message for the user. I think technically that should work.
Let me know if I'm wrong and I'd appreciate insight on how you get the message displayed. (ex: when you have the message display that top_k must be greater than 0, or something like that.)

JohnGiorgi · 2021-04-15T14:35:00Z

I'd appreciate insight on how you get the message displayed.

Check out the FastAPI documentation on handling errors: https://fastapi.tiangolo.com/tutorial/handling-errors/?h=erro. It should just be a matter of adding something like:

raise HTTPException(status_code=422, detail='the "uid" provided does not appear to be a valid PMID')

@jvwong, @maxkfranz, not sure which status_code is most appropriate here. I figure 400 of 422.

@Anwesh1 Is there a way to pinpoint which "uid" (provided by the user) actually triggered the error? I don't see that here

Anwesh1 · 2021-04-15T14:46:13Z

Line 102 where it states logging.warn, when changed to logging.error shows this:
ERROR:root:No PMID for {"id:": ["8463928683 Error occurred: The following PMID is not available: 938463928683"]}
So, when the record variable is accessed, it actually displays the uid that is causing problems.

JohnGiorgi · 2021-04-15T14:47:58Z

Cool. Can we access this as a string and pass it to the detail keyword of HTTPException?

So something like:

 if "PMID" not in record: 
     raise HTTPException(status_code=422, detail=f"No PMID for {json.dumps(record)}")

The only concern is that continue keyword. Looks like these errors were just being ignored previously, and now they are going to cause the code to stop and raise an error. Is that okay? (I don't really know how the ncbi module is supposed to work).

Anwesh1 · 2021-04-15T14:58:10Z

Essentially, that if block only triggers when we have a case of an invalid uid, so we don't actually need the continue here. We had the continue for the pubmed-dl library because we were bulk downloading and so we didn't mind if we skipped a few of them.

Anwesh1 · 2021-04-15T15:02:51Z

The error message looks like this:

This was referenced Apr 15, 2021

Develop #83

Closed

error thrown when provided invalid uid #84

Merged

jvwong mentioned this issue Jun 18, 2021

Flag to use index #61

Closed

JohnGiorgi closed this as completed Jun 19, 2021

JohnGiorgi reopened this Jun 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edge Case for json requests #81

Edge Case for json requests #81

Anwesh1 commented Apr 13, 2021 •

edited

Loading

JohnGiorgi commented Apr 13, 2021 •

edited

Loading

jvwong commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

Anwesh1 commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

jvwong commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

jvwong commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

jvwong commented Apr 14, 2021

Anwesh1 commented Apr 14, 2021

JohnGiorgi commented Apr 15, 2021 •

edited

Loading

Anwesh1 commented Apr 15, 2021

JohnGiorgi commented Apr 15, 2021 •

edited

Loading

Anwesh1 commented Apr 15, 2021

Anwesh1 commented Apr 15, 2021

Edge Case for json requests #81

Edge Case for json requests #81

Comments

Anwesh1 commented Apr 13, 2021 • edited Loading

JohnGiorgi commented Apr 13, 2021 • edited Loading

jvwong commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 • edited Loading

Anwesh1 commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 • edited Loading

jvwong commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 • edited Loading

jvwong commented Apr 14, 2021

JohnGiorgi commented Apr 14, 2021 • edited Loading

jvwong commented Apr 14, 2021

Anwesh1 commented Apr 14, 2021

JohnGiorgi commented Apr 15, 2021 • edited Loading

Anwesh1 commented Apr 15, 2021

JohnGiorgi commented Apr 15, 2021 • edited Loading

Anwesh1 commented Apr 15, 2021

Anwesh1 commented Apr 15, 2021

Anwesh1 commented Apr 13, 2021 •

edited

Loading

JohnGiorgi commented Apr 13, 2021 •

edited

Loading

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

JohnGiorgi commented Apr 14, 2021 •

edited

Loading

JohnGiorgi commented Apr 15, 2021 •

edited

Loading

JohnGiorgi commented Apr 15, 2021 •

edited

Loading