Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelfeil committed Mar 16, 2024
1 parent a6711bf commit 8833f43
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 24 deletions.
20 changes: 20 additions & 0 deletions docs/docs/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,26 @@ docker run \
--model-name-or-path $model --port $port
```

### Extending the Dockerfile

Launching multiple models in one dockerfile

Multiple models on one GPU is in experimental mode. You can use the following temporary solution:
```Dockerfile
FROM michaelf34/infinity:latest
# Dockerfile-ENTRYPOINT for multiple models via multiple ports
ENTRYPOINT ["/bin/sh", "-c", \
"(. /app/.venv/bin/activate && infinity_emb --port 8080 --model-name-or-path sentence-transformers/all-MiniLM-L6-v2 &);\
(. /app/.venv/bin/activate && infinity_emb --port 8081 --model-name-or-path intfloat/e5-large-v2 )"]
```

You can build and run it via:
```bash
docker build -t custominfinity . && docker run -it --gpus all -p 8080:8080 -p 8081:8081 custominfinity
```

Both models now run on two instances in one dockerfile servers. Otherwise, you could build your own FastAPI/flask instance, which wraps around the Async API.


### dstack
dstack allows you to provision a VM instance on the cloud of your choice.
Expand Down
39 changes: 15 additions & 24 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under [MIT License](https://github.com/michaelfeil/infinity/blob/main/LICENSE). Infinity powers inference behind [Gradient.ai](https://gradient.ai).
# [Infinity](https://github.com/michaelfeil/infinity)

## Why Infinity:
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under [MIT License](https://github.com/michaelfeil/infinity/blob/main/LICENSE). Infinity powers inference behind [Gradient.ai](https://gradient.ai) and other Embedding API providers.

## Why Infinity

Infinity provides the following features:

Expand All @@ -10,7 +12,7 @@ Infinity provides the following features:
* **Correct and tested implementation**: Unit and end-to-end tested. Embeddings via infinity are identical to [SentenceTransformers](https://github.com/UKPLab/sentence-transformers/) (up to numerical precision). Lets API users create embeddings till infinity and beyond.
* **Easy to use**: The API is built on top of [FastAPI](https://fastapi.tiangolo.com/), [Swagger](https://swagger.io/) makes it fully documented. API are aligned to [OpenAI's Embedding specs](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). See below on how to get started.

# Getting started
## Getting started

Install `infinity_emb` via pip
```bash
Expand Down Expand Up @@ -46,7 +48,7 @@ Check the `--help` command to get a description for all parameters.
infinity_emb --help
```

### Launch FAQ:
## Launch FAQ
<details>
<summary>What are embedding models?</summary>
Embedding models can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search.
Expand All @@ -73,30 +75,19 @@ infinity_emb --help

</details>

<details>
<summary>Launching multiple models in one dockerfile</summary>

Multiple models on one GPU is in experimental mode. You can use the following temporary solution:
```Dockerfile
FROM michaelf34/infinity:latest
# Dockerfile-ENTRYPOINT for multiple models via multiple ports
ENTRYPOINT ["/bin/sh", "-c", \
"(. /app/.venv/bin/activate && infinity_emb --port 8080 --model-name-or-path sentence-transformers/all-MiniLM-L6-v2 &);\
(. /app/.venv/bin/activate && infinity_emb --port 8081 --model-name-or-path intfloat/e5-large-v2 )"]
```

You can build and run it via:
```bash
docker build -t custominfinity . && docker run -it --gpus all -p 8080:8080 -p 8081:8081 custominfinity
```

Both models now run on two instances in one dockerfile servers. Otherwise, you could build your own FastAPI/flask instance, which wraps around the Async API.

</details>

<details>
<summary>Using Langchain with Infinity</summary>
Now available under # Integrations in the side panel.
```
</details>
<details>
<summary>Question not answered here?</summary>
There is a Discussion section on the Github of Infinity:
https://github.com/michaelfeil/infinity/discussions
</details>

0 comments on commit 8833f43

Please sign in to comment.