Skip to content

Running multiple models via Python server #119

Answered by michaelfeil
semoal asked this question in Q&A
Discussion options

You must be logged in to vote

Looks good - thats how two classes are intended to use. Here is how I have build it myself: https://github.com/michaelfeil/infinity/blob/main/docs/benchmarks/simple_app.py
If async def teardown function uses await astop on both.

Additional:

  • Further, I would run low batch sizes, such as 16 or 32 to not go out-of-memory for edge case requests.
  • make sure you have hf_transfer installed for max transfer speed.
  • pin the version of infinity -> New updates might break compatibility (the space is evolving fast).

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by semoal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants