Skip to content

Commit

Permalink
fix instruct train
Browse files Browse the repository at this point in the history
  • Loading branch information
xingyaoww committed Jan 24, 2025
1 parent c11571a commit 4d92412
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,12 @@ Works for model <= 1.5B
For Qwen2.5-0.5B base, we know it fails to learn reasoning.

```
export CUDA_VISIBLE_DEVICES=7
export CUDA_VISIBLE_DEVICES=3
export N_GPUS=1
export BASE_MODEL=Qwen/Qwen2.5-1.5B
export BASE_MODEL=Qwen/Qwen2.5-0.5B
export DATA_DIR=$HOME/data/countdown
export WANDB_API_KEY=0929e692448f1bc929d71d7e3ece80073c3041e6
export EXPERIMENT_NAME=countdown-qwen2.5-1.5b
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS
PYTHONUNBUFFERE=1 python3 -m verl.trainer.main_ppo \
Expand Down Expand Up @@ -171,7 +171,7 @@ python examples/data_preprocess/countdown.py --template_type=qwen-instruct --loc
Then use this data to train the instruct model.

```
export CUDA_VISIBLE_DEVICES=4,5
export CUDA_VISIBLE_DEVICES=0,1
export N_GPUS=2
export BASE_MODEL=Qwen/Qwen2.5-3B-Instruct
export DATA_DIR=$HOME/data/countdown-qwen-instruct
Expand Down
1 change: 1 addition & 0 deletions examples/data_preprocess/countdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ def gen_dataset(
def make_prefix(dp, template_type):
target = dp['target']
numbers = dp['nums']
# NOTE: also need to change reward_score/countdown.py
if template_type == 'base':
"""This works for any base model"""
prefix = f"""A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.
Expand Down
2 changes: 2 additions & 0 deletions verl/utils/reward_score/countdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ def extract_solution(solution_str):
# Remove everything before the first "Assistant:"
if "Assistant:" in solution_str:
solution_str = solution_str.split("Assistant:", 1)[1]
elif "<|im_start|>assistant" in solution_str:
solution_str = solution_str.split("<|im_start|>assistant", 1)[1]
else:
return None
solution_str = solution_str.split('\n')[-1]
Expand Down

0 comments on commit 4d92412

Please sign in to comment.