With remarkable advancements, large language models (LLMs) have attracted significant efforts to develop LLM-based agents capable of executing intricate multi-step decision-making tasks. Existing approaches predominantly build upon the external performance measure to guide the decision-making process but the reliance on the external performance measure as prior is problematic in real-world scenarios, where such prior may be unavailable, flawed, or even erroneous. For genuine autonomous decision-making for LLM-based agents, it is imperative to develop rationality from their posterior experiences to judge the utility of each decision independently.
In this work, we propose RaDAgent (Rational Decision-Making Agent), which fosters the development of its rationality through an iterative framework involving Experience Exploration and Utility Learning. Within this framework, Elo-based Utility Learning is devised to assign Elo scores to individual decision steps to judge their utilities via pairwise comparisons. Consequently, these Elo scores guide the decision-making process to derive optimal outcomes.
Experimental results on the Game of 24, WebShop, ToolBench and RestBench datasets demonstrate RaDAgent’s superiority over baselines, achieving about 7.8% improvement on average. Besides, RaDAgent also can reduce costs (ChatGPT API calls), highlighting its effectiveness and efficiency.
Our paper is released here.
Here is the guideline of how to run RaDAgent method in different downstream-tasks.
You need to first clone the repo and install the project dependency:
python -m pip install -e ./
Then, before you start, you need to put your OpenAI API Keys in someplace of the Repo ets_utils.py
.
To run on the downstream tasks we report in the paper, use the following guides.
run the following commands:
cd run_webshop
python env_server.py
The IP and port can be set in env_server.py
.
host = "localhost" # your host ip
port = 12348 # your host port
cd run_webshop
bash run_task.sh
You can set the connection ip and port in run_webshop\webshopping_env_online.py
.
cd run_webshop
bash eval.sh
Configure the REST API environment (TMDB, Spotify) by following the instructions in RestGPT.
run the following commands:
cd run_restbench
python env_server.py
The IP and port can be set in env_server.py
.
host = "localhost" # your host ip
port = 12348 # your host port
cd run_restbench
bash run_task.sh
You can set the connection ip and port in run_restbench\restbench_env.py
.
cd run_restbench
python restbench_eval.py
We follow the setting of ToT, which uses a ./Downstream_tasks/24.csv
and test the last 100 problems, which is the hardest in the dataset. Then, you can run the experiment:
python ./test_codes/test_24.py
You need to specify some arguments like input or output dir, process_num, in that python file. This code is not only for Elo-tree-search, you can also specify DFS, DFSDT, BFS method in the --method
part.
Also, we implement an MCTS baseline, based on the traditional UCT tree-search method, it has different versions, the difference in get_reward
function, you can specify it. To run the baseline, use the follow command:
python ./test_codes/MCTS.py
MCTS can also finds a result of each case, but it uses 100 times more simulations than ETS. We give a compare in our paper.
To test on Toolbench, it is a little complex. You need to
- request a ToolServer
toolbenchkey
following the guide here, also you can build it locally after getting a toolbench key, you need to specify thetoolbench_key
inets_utils.py
- add denpendency: We use an early version of toolbench, so you need to unzip the
assets.zip
to unzip the denpendency.
After setting up toolbench env, you can run the commands:
python answer_generation.py
In our experiment, we use this split./assetstoolbench_test_data_0925/test_query_ids
. You can try different methods in --method
part, like DFS, BFS, ETS. The explanation of the hyperparameters are described in the following part.
Because gpt-3.5-turbo-0613 can not be Requested by OpenAI, and the main experiement is performed in 2023.07, Many Rapid-API server is not exists today, the score may not re-implemented today. But we have hold the original ToolBench Test result of our main experiment, If you have any problems reimplementing ToolBench experiment, you can connect [email protected]
Model | Game of 24 | WebShop | ToolBench |
---|---|---|---|
CoT | 6.00 | 56.23 | 16.60 |
CoT@3 | 7.00 | 56.45 | 31.20 |
Reflexion | 7.00 | 57.21 | 26.60 |
ToT-BFS | 11.00 | 50.20 | 38.00 |
ToT-DFS | 14.00 | 55.60 | 45.58 |
DFSDT | 29.00 | 57.25 | 50.20 |
RaDAgent | 43.00 | 59.36 | 61.92 |
Model | TMDB | Spotify |
---|---|---|
Offline | 33.0 | 36.4 |
DEPS | 43.0 | 43.8 |
ReAct | 57.0 | 49.1 |
Reflexion | 59.0 | 61.4 |
RestGPT | 79.0 | 74.5 |
RestGPT(ChatGPT) | 65.0 | 72.3 |
RaDAgent | 84.0 | 80.7 |
Here, we just parse your method name, and specify it into some format ETS method such as ETS_all-100_annealing_k50_sqrt_s100_f1_t173.72_p0.9_c15_m3_rn1_rg3
. we mainly split the name by "_", the logic is at ./test_codes/test_24.py
, line 340-380
For ETS:
-
ETS means Elo-base Tree Search, the main method of RaD-Agent
-
K50, means how to update the elo score in the equation. bigger k means to update elo score fastly. We set this to 50 in all the experiment
-
s100, means we will give 100 Simulations of one task. (In the main experiment, we just set it 30)
-
f1: whether we will generate many candidates in forward pass, This equals to 1 in
DFSDT
-
t173.72. In
Elo Algorithm
, the temperature is set to 173.72 -
P0.9, the default probability to explore a new child. bigger P is more like to perform exploration. We set it to 0.5 in main experiment.
-
c15: the max number of one node in the tree. In this example, If one node's child count >= 15, we will never generate new nodes for it.
-
m3: perform elo backward ever 3 forward. Larger m will save API Cost, but the Elo approxmation may be biased. We use 3 in our experiment
-
rn1, rg3: When performing a championship, we first let all the newer nodes to complete rn times, and randomly select rg times global matching
For DFS:
woFilter
means to not perform pair-wise comparsion in the forward process.- In the original ToT DFS, this use
Filter
method, which may add many API request. - The
DFSDT
methods means to set thewoFilter
- In the original ToT DFS, this use
w
means the width of the tree. The max child count of one node.
For BFS: BFS follows a process similar to Beam-search
w
means the beam-pool size,e
means all the candidates in the pool havee
child. So the total cost nearsdepth*w*e
For CoT:
CoT@n
means to try n times. Early-stopping when findding a result
For Reflexion:
Reflexion@n
means to try n times. Early-stopping when findding a result
You can read through ./LLM/ETS.py
to see the implemenation of the Elo-based tree search Algorithms
Feel free to cite us if you like our work.
@article{ye2023rational,
title={Rational Decision-Making Agent with Internalized Utility Judgment},
author={Ye, Yining and Cong, Xin and Tian, Shizuo and Qin, Yujia and Liu, Chong and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2308.12519},
year={2023}
}
Our project's source code is licensed under the Apache 2.0 License. This license permits the use, modification, and distribution of the code, subject to certain conditions outlined in the Apache 2.0 License.
If you have any questions, please feel free to contact me at [email protected]
.