-
For a reason I need the langserve API to be responsing even before a chain is fully ready. I'm good if it can just return 500, instead of lack of API until the chain to be constructed. Any best practice to do so? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 4 replies
-
I'll expose the underlying API handler in a bit, that'll provide more flexibility to do things. Would you mind sharing a bit more what the slow part of the chain is?
|
Beta Was this translation helpful? Give feedback.
-
It's mostly loading of the model. Yes, some custom fallback handler would be awesome. |
Beta Was this translation helpful? Give feedback.
-
OK it sounds like you want to differentiate between liveness and readiness of the server. When you're loading the model is that happening in the global namespace and is it a blocking operation? |
Beta Was this translation helpful? Give feedback.
-
I have a fastapi But since then, the whole app won't be ready until model is loaded and chain is created. I could have a different app but that would run on a different port, which is not very firewall friendly. Thus I am reaching out here for help. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Being able to dynamically switch between chains or parameters in a chain opens up a lot of possibilities for developers. |
Beta Was this translation helpful? Give feedback.
-
@LinkLeong dynamic switching between chains and parameters can be done using configurable runnables. Configurable runnables allow configuring both parameters on any given chain as well as selecting from alternative chains to run. Please see: https://python.langchain.com/docs/expression_language/how_to/configure @tigerinus The newest release exposes the underlying APIHandler: https://github.com/langchain-ai/langserve/blob/main/examples/api_handler_examples/server.py If the call to load the model is blocking, you may be able to load it on a thread so that it loads in the background. I suspect you can achieve something similar, by loading the model into a global variable from a thread and implementing a custom runnable, that errors if the global variable is set to a None, but uses it global variable (in this case a model) if it's not. Also just wondering: Does the model that you're loading support concurrent usage? Are you expecting to serve multiple concurrent requests from users? |
Beta Was this translation helpful? Give feedback.
-
im facing the same issue. Can somebody please give some insights about model concurrence? Im doing some basics tests with langchain and langserve and failing when sending a request to the model when a previous request is still running in the model. |
Beta Was this translation helpful? Give feedback.
@LinkLeong dynamic switching between chains and parameters can be done using configurable runnables.
Configurable runnables allow configuring both parameters on any given chain as well as selecting from alternative chains to run. Please see: https://python.langchain.com/docs/expression_language/how_to/configure
@tigerinus The newest release exposes the underlying APIHandler:
https://github.com/langchain-ai/langserve/blob/main/examples/api_handler_examples/server.py
If the call to load the model is blocking, you may be able to load it on a thread so that it loads in the background.
I suspect you can achieve something similar, by loading the model into a global variable from a thread and im…