-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup model routing config and plan routing to o1 #6189
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This is a great start for model routing and LGTM!
Router that routes the prompt that is judged by a LLM as complex and requires a step-by-step plan. | ||
""" | ||
|
||
JUDGE_MODEL = 'gpt-4o' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be interesting to see if we can experiment with cheaper model for that 🤔
* Translating high-level requirements into detailed implementation steps and ensuring consistency. | ||
|
||
=== BEGIN USER MESSAGE === | ||
{message} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also experiment sending O1 with the last 5/10 action/observation 🤔 in case there's some deep reasoning required to figure out the error, etc.
) | ||
|
||
# Replace the model with the reasoning model | ||
kwargs['model'] = self.model_routing_config.reasoning_model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is model
enough, or also: custom provider, base URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could design the reasoning model not as a part of an LLM instance, but as a second LLM instance in the agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is model enough, or also: custom provider, base URL?
Yeah, I think we also need to allow user to set these, especially if they don't use via a llm proxy 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using [llm.reasoning_model]
will do it implicitly!
[model_routing] | ||
|
||
# The reasoning model to use for plan generation | ||
reasoning_model = "o1-preview-2024-09-12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reasoning_model = "o1-preview-2024-09-12" | |
[llm.reasoning_model] | |
model = "o1-preview-2024-09-12" | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is also another approach, my thought is for now we only use reasoning models specifically for model routing, so I put it in this config group (with other values in the future). When we also use them for other purposes, we can probably move to llm-specific groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that we can reuse the way we define a model (which will implicitly take care of the correct loading and init all base_url
etc).
It doesn't say which component of openhands
loads the definition of [llm.reasoning_model]
, it can be the routing component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, if a user wants to use a reasoning model today, for the agent, they can do so. They just choose a reasoning model and configure it. Ability to use it isn't new?
We can just avoid to duplicate LLMConfig settings ("reasoning_model", "reasoning_model_base_url", "reasoning_model_api_key", "reasoning_model_aws...something" etc) into the new routing section, instead we can reference existing configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds good to me, thanks for the suggestion! I'll try to address this after getting the routing behavior to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so happy to see this, thank you! I do think we are missing some minimal framework to experiment with reasoning models.
About the way to choose another model:
We already have the ability to choose, configure, and use a random model, for example in evals: we can write the model configuration in toml, in a custom named LLM config section, [llm.o1]
, load it with an utility function, and instantiate an LLM from it.
We can use that here. Names are user-defined, and we can, if we want, set in stone a particular name for the reasoning model, e.g. [llm.reasoning_model]
, or [llm.oh_reasoning_model]
, or [llm.blueberry]
(or strawberry for that matter), whatever name.
param_value = '\n' + param_value + '\n' if is_multiline_value else param_value | ||
tool_call_str += f'<parameter={param_name}>{param_value}</parameter>\n' | ||
tool_call_str += '</function>' | ||
return tool_call_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it feels like this code is also in fn_call_converter.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it doesn't contain extra info like turn number I think 🤔 Maybe we can also reuse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'm also planning to add extra markup about whether the turn is routed or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm changing my mind about reuse. I think it would be better to do it as you do it here.
(maybe all functions in trajectory.py
can be private, except format_trajectory
. They're used only here, it would make it clear. In addition, there's a single use of format_trajectory
from codeact.)
The reason is that there is, at minimum, some simplification we can do on fn_call_converter.py
, and it will be easier to refactor as it is now, than if it gets more responsibilities, use cases, parameters. It's more readable and easier to keep all these functions separate and clear, as you implemented them now.
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR is to:
Link of any specific issues this addresses