How to run or access infinity on hf a space? #162

ffreemt · 2024-03-19T09:58:07Z

ffreemt
Mar 19, 2024

Hi. Thanks for the wonderful project.

Is it possible to directly deploy infinity on a hf space?

I guess it's possible to do it via gradio. But all I need is just embeddings. So I wonder whether I can simply run something like infinity_emb --model-name-or-path sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --port 7860 in a hf space and access the API.

I tried to deploy infinity on a hf space https://huggingface.co/spaces/mikeee/emb384. It seems to be running but I cannot figure out how to make a request to the API. There isn't anything at https://huggingface.co/spaces/mikeee/emb384/docs or https://huggingface.co/spaces/mikeee/emb384:7860/docs.

Answered by ffreemt

Mar 20, 2024

It works! For example, Swagger UI is at https://mikeee-emb384.hf.space/docs

All we need to know is the direct url: https://{hf-usernam}-{space-name}.hf.space.

View full answer

michaelfeil · 2024-03-19T15:22:00Z

michaelfeil
Mar 19, 2024
Maintainer

Love the idea! Not sure how well you can expose a RestAPI on huggingface spaces. I would follow this Guide - effectivley you need to use Gradio and not FastAPI (my guess) https://www.tomsoderlund.com/ai/building-ai-powered-rest-api

I would default to the Python API (example below), then add a RestAPI later

import asyncio
from infinity_emb import AsyncEmbeddingEngine, EngineArgs

engine = AsyncEmbeddingEngine.from_args(EngineArgs(model_name_or_path = "BAAI/bge-small-en-v1.5", engine="torch"))

async def main(sentences = ("Embed this is sentence via Infinity.", "Paris is in France.")): 
    async with engine: # engine starts with engine.astart()
        embeddings, usage = await engine.embed(sentences=sentences)
    # engine stops with engine.astop()

# call the function from any async func or from asyncio.run()
asyncio.run(main())

0 replies

ffreemt · 2024-03-20T05:45:05Z

ffreemt
Mar 20, 2024
Author

I came across this： https://medium.com/@dahmanihichem01/mixtral-and-rest-api-turning-mixtral-8x7b-into-an-api-using-huggingface-spaces-a8b150b47246 and https://huggingface.co/spaces/iiced/mixtral-46.7b-fastapi, which appears to be serving FastAPI directly from uvicorn (without using gradio). Isn't infinity-embed also using uvicorn?

https://huggingface.co/spaces/iiced/mixtral-46.7b-fastapi/blob/main/main.py has

@app.post("/generate/")
   async def generate_text(item: Item):
   return {"response": generate(item)}

while `https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/infinity_server.py has:

    @app.post(
        f"{url_prefix}/embeddings",
        response_model=OpenAIEmbeddingResult,
        response_class=responses.ORJSONResponse,
    )

Since curl -X POST ... https://iiced-mixtral-46-7b-fastapi.hf.space/generate/ can access https://huggingface.co/spaces/iiced/mixtral-46.7b-fastapi,
so perhaps we can somehow access infinity from an hf space with curl -XPOST ... https://foobar.hf.space/{url_prefix}/embeddings

2 replies

ffreemt Mar 20, 2024
Author

It works! For example, Swagger UI is at https://mikeee-emb384.hf.space/docs

All we need to know is the direct url: https://{hf-usernam}-{space-name}.hf.space.

Answer selected by ffreemt

michaelfeil Mar 20, 2024
Maintainer

@ffreemt Cool, thats awesome! Whats the url of the hf-space / source code?

ffreemt · 2024-03-20T08:21:06Z

ffreemt
Mar 20, 2024
Author

Just a plain Dockerfile in a hf docker space (sdk: docker in ERADME.md), need to set TRANSFORMERS_CACHE=/tmp/cache or some writable directory. The default will result in a ....denied error.

Have a look at https://huggingface.co/spaces/mikeee/emb384/tree/main: Dockerfile, reuqirements.txt and start-infinity-emb.sh

requirements.txt (a single line infinity-emb[all]) can in fact be removed (add pip install infinity-emb[all] in Dockerfile), start-infinity-emb.sh can probably also incorporated into Dockerfile.

2 replies

michaelfeil Mar 20, 2024
Maintainer

Cool, might clone the space to showcase infinity.
does it automatically pull the “latest” image every couple of days?

ffreemt Mar 21, 2024
Author

When the space is updated or restarted, the container will be built and deployed anew.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run or access infinity on hf a space? #162

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to run or access infinity on hf a space? #162

ffreemt Mar 19, 2024

Replies: 3 comments · 4 replies

michaelfeil Mar 19, 2024 Maintainer

ffreemt Mar 20, 2024 Author

ffreemt Mar 20, 2024 Author

michaelfeil Mar 20, 2024 Maintainer

ffreemt Mar 20, 2024 Author

michaelfeil Mar 20, 2024 Maintainer

ffreemt Mar 21, 2024 Author

ffreemt
Mar 19, 2024

Replies: 3 comments 4 replies

michaelfeil
Mar 19, 2024
Maintainer

ffreemt
Mar 20, 2024
Author

ffreemt Mar 20, 2024
Author

michaelfeil Mar 20, 2024
Maintainer

ffreemt
Mar 20, 2024
Author

michaelfeil Mar 20, 2024
Maintainer

ffreemt Mar 21, 2024
Author