Return 500 when chain is not ready? #343

tigerinus · 2023-12-13T15:57:06Z

tigerinus
Dec 13, 2023

For a reason I need the langserve API to be responsing even before a chain is fully ready.

I'm good if it can just return 500, instead of lack of API until the chain to be constructed.

Any best practice to do so?

Thanks

Answered by eyurtsev

Dec 21, 2023

@LinkLeong dynamic switching between chains and parameters can be done using configurable runnables.

Configurable runnables allow configuring both parameters on any given chain as well as selecting from alternative chains to run. Please see: https://python.langchain.com/docs/expression_language/how_to/configure

@tigerinus The newest release exposes the underlying APIHandler:

https://github.com/langchain-ai/langserve/blob/main/examples/api_handler_examples/server.py

If the call to load the model is blocking, you may be able to load it on a thread so that it loads in the background.

I suspect you can achieve something similar, by loading the model into a global variable from a thread and im…

View full answer

eyurtsev · 2023-12-13T17:12:21Z

eyurtsev
Dec 13, 2023
Maintainer

I'll expose the underlying API handler in a bit, that'll provide more flexibility to do things.

Would you mind sharing a bit more what the slow part of the chain is?

Is it loading a model into memory?
Waiting for an external service to be ready?
Is it making some network requests on init?

0 replies

tigerinus · 2023-12-13T23:12:32Z

tigerinus
Dec 13, 2023
Author

It's mostly loading of the model.

Yes, some custom fallback handler would be awesome.

0 replies

eyurtsev · 2023-12-14T03:55:26Z

eyurtsev
Dec 14, 2023
Maintainer

OK it sounds like you want to differentiate between liveness and readiness of the server.

When you're loading the model is that happening in the global namespace and is it a blocking operation?

0 replies

tigerinus · 2023-12-14T04:18:51Z

tigerinus
Dec 14, 2023
Author

OK it sounds like you want to differentiate between liveness and readiness of the server.

When you're loading the model is that happening in the global namespace and is it a blocking operation?

I have a fastapi app already created with other APIs. I add langserve API to the app when I found that can be done in 1 line, which is amazing.

But since then, the whole app won't be ready until model is loaded and chain is created.

I could have a different app but that would run on a different port, which is not very firewall friendly.

Thus I am reaching out here for help.

Thanks!

0 replies

LinkLeong · 2023-12-14T07:41:06Z

LinkLeong
Dec 14, 2023

Being able to dynamically switch between chains or parameters in a chain opens up a lot of possibilities for developers.
@eyurtsev

0 replies

eyurtsev · 2023-12-21T14:57:59Z

eyurtsev
Dec 21, 2023
Maintainer

@LinkLeong dynamic switching between chains and parameters can be done using configurable runnables.

Configurable runnables allow configuring both parameters on any given chain as well as selecting from alternative chains to run. Please see: https://python.langchain.com/docs/expression_language/how_to/configure

@tigerinus The newest release exposes the underlying APIHandler:

https://github.com/langchain-ai/langserve/blob/main/examples/api_handler_examples/server.py

If the call to load the model is blocking, you may be able to load it on a thread so that it loads in the background.

I suspect you can achieve something similar, by loading the model into a global variable from a thread and implementing a custom runnable, that errors if the global variable is set to a None, but uses it global variable (in this case a model) if it's not.

Also just wondering: Does the model that you're loading support concurrent usage? Are you expecting to serve multiple concurrent requests from users?

2 replies

tigerinus Dec 22, 2023
Author

Yes, multiple concurrent requests from users are expected

eyurtsev Feb 1, 2024
Maintainer

Marked above as answer -- using an APIHandler a user is able to implement the required functionality.

JavierCCC · 2024-01-25T17:19:45Z

JavierCCC
Jan 25, 2024

im facing the same issue. Can somebody please give some insights about model concurrence? Im doing some basics tests with langchain and langserve and failing when sending a request to the model when a previous request is still running in the model.

2 replies

eyurtsev Jan 25, 2024
Maintainer

LangServe is not for deploying local models. It doesn't have any mechanism to manage hardware resources. So you would need to take of that yourself if you're using LangServe. There might be another open source project that allows to server the model, and then you could make API requests to it from LangServe.

JavierCCC Feb 2, 2024

I figured out langchain vllm object was using the not async vllm engine. So, im using async vllm engine, not using langchain anymore and everything is fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return 500 when chain is not ready? #343

{{title}}

Replies: 7 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Return 500 when chain is not ready? #343

tigerinus Dec 13, 2023

Replies: 7 comments · 4 replies

eyurtsev Dec 13, 2023 Maintainer

tigerinus Dec 13, 2023 Author

eyurtsev Dec 14, 2023 Maintainer

tigerinus Dec 14, 2023 Author

LinkLeong Dec 14, 2023

eyurtsev Dec 21, 2023 Maintainer

tigerinus Dec 22, 2023 Author

eyurtsev Feb 1, 2024 Maintainer

JavierCCC Jan 25, 2024

eyurtsev Jan 25, 2024 Maintainer

JavierCCC Feb 2, 2024

tigerinus
Dec 13, 2023

Replies: 7 comments 4 replies

eyurtsev
Dec 13, 2023
Maintainer

tigerinus
Dec 13, 2023
Author

eyurtsev
Dec 14, 2023
Maintainer

tigerinus
Dec 14, 2023
Author

LinkLeong
Dec 14, 2023

eyurtsev
Dec 21, 2023
Maintainer

tigerinus Dec 22, 2023
Author

eyurtsev Feb 1, 2024
Maintainer

JavierCCC
Jan 25, 2024

eyurtsev Jan 25, 2024
Maintainer