Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Score and return only documents provided #114

Merged
merged 6 commits into from
Dec 14, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,10 @@ The return value is a JSON representation of the `top_k` most similar documents

If `"text"` is not provided, we assume `"uid"`s are valid PMIDs and fetch the title and abstract text before embedding, indexing and searching.

- Notes on optional parameters
- `top_k`: A positive integer (default is 10) that limits the search results to this many of the most similar neighbours (articles)
jvwong marked this conversation as resolved.
Show resolved Hide resolved
- `docs_only`: A boolean (default is false) that instructs the service to return scores for the provided `documents`. If true, top_k is disregarded.
jvwong marked this conversation as resolved.
Show resolved Hide resolved

### Running via Docker

#### Setup
Expand Down
6 changes: 3 additions & 3 deletions semantic_search/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ def index(request: Request):
@app.post("/search", tags=["Search"], response_model=List[TopMatch])
async def search(search: Search):
"""Returns the `top_k` most similar documents to `query` from the provided list of `documents`
and the index.
and the index. When docs_only is True, returns all `documents` provided, and disregards `top_k`.
"""
ids = [int(doc.uid) for doc in search.documents]
texts = [document.text for document in search.documents]
Expand Down Expand Up @@ -167,7 +167,7 @@ async def search(search: Search):
# Can't search for more items than exist in the index
top_k = min(num_indexed, search.top_k)

if search.use_docs:
if search.docs_only:
top_k = num_indexed

# Perform the search
Expand All @@ -177,7 +177,7 @@ async def search(search: Search):
top_k_scores = top_k_scores.reshape(-1).tolist()

# Pick out results for the incoming ids in search.documents
if search.use_docs:
if search.docs_only:
documents_positions = [top_k_indicies.index(id) for id in ids]
top_k_indicies = ids
top_k_scores = [top_k_scores[position] for position in documents_positions]
Expand Down
2 changes: 1 addition & 1 deletion semantic_search/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class Search(BaseModel):
query: Document
documents: List[Document] = []
top_k: int = Field(10, gt=0, description="top_k must be greater than 0")
use_docs: bool = False
docs_only: bool = False

class Config:
schema_extra = {
Expand Down
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def followup_request_with_test() -> Request:
"text": "Members of TGFbeta superfamily are found to play important roles in many cellular...",
},
],
"use_docs": True,
"docs_only": True,
}
# We don't actually test scores, so use a dummy value of -1
response = [
Expand Down