Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

做全参数SFT报错No module named 'transformers.models.gemma' #5760

Open
1 task done
RZFan525 opened this issue Oct 21, 2024 · 0 comments
Open
1 task done

做全参数SFT报错No module named 'transformers.models.gemma' #5760

RZFan525 opened this issue Oct 21, 2024 · 0 comments
Labels
pending This problem is yet to be addressed

Comments

@RZFan525
Copy link

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.1.dev0
  • Platform: Linux-4.19.91-014.15-kangaroo.alios7.x86_64-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • PyTorch version: 2.5.0+cu124 (GPU)
  • Transformers version: 4.45.2
  • Datasets version: 2.21.0
  • Accelerate version: 0.34.2
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA A100-SXM4-80GB

Reproduction

我想对llama-3-chinese做全参数SFT微调:

if [ -z "${BASH_VERSION}" ]; then
	echo "Please use bash to run this script." >&1
	exit 1
fi

# set some arguments here
ckpts_dir="/nas/shared/GAIR/ckpts"
output_dir="/cpfs01/shared/GAIR/GAIR_hdd/rzfan"
BASE_MODEL="/nas/shared/GAIR/ckpts/llama-3-chinese/8b"
OUTPUT_DIR=${output_dir}/TranslationDataDetection/llama3_Chinese_8b_GSM8K-ZH-train-1k_sft_answer_loss
NUM_GPUS=8
BATCH_SIZE_PER_GPU=16
TOTAL_BATCH_SIZE=128
# TOTAL_BATCH_SIZE=128
GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU))

echo -e "\
      base model path: ${BASE_MODEL}\n\
      GPU number: ${NUM_GPUS}\n\
      batch size per GPU: ${BATCH_SIZE_PER_GPU}\n\
      gradient accumulation steps: ${GRADIENT_ACC_STEPS}\n\
      output path: ${OUTPUT_DIR}\n
      "

mkdir -p "${OUTPUT_DIR}"
# OUTPUT_DIR="$(cd "${OUTPUT_DIR}" &>/dev/null && pwd)"
# if [[ ! -f "${OUTPUT_DIR}/.gitignore" ]]; then
# 	echo '*' >"${OUTPUT_DIR}/.gitignore"
# fi

exec 1> >(tee "${OUTPUT_DIR}/stdout.log" >&1) 2> >(tee "${OUTPUT_DIR}/stderr.log" >&2)


# the script is based on the setting in tulu2 paper, you can modify the setting here according to your own needs'
# for detailed guidance of the parameters, run python src/train_bash.py --help
# REMEMBER TO CHANGE adjust batch size when using less than 8 gpus
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port=9701 src/train.py \
    --deepspeed ds_config.json \
    --stage sft \
    --do_train True\
    --model_name_or_path ${BASE_MODEL} \
    --dataset GSM8K_ZH_train \
    --template llama3 \
    --cutoff_len 2048 \
    --finetuning_type full \
    --temperature 0 \
    --output_dir ${OUTPUT_DIR} \
    --overwrite_cache \
    --per_device_train_batch_size ${BATCH_SIZE_PER_GPU} \
    --weight_decay 0.1 \
    --gradient_accumulation_steps ${GRADIENT_ACC_STEPS} \
    --lr_scheduler_type linear \
    --warmup_ratio 0.1 \
    --logging_steps 1 \
    --save_steps 1000 \
    --learning_rate 2e-5 \
    --num_train_epochs 10 \
    --plot_loss \
    --report_to "wandb"\
    --bf16 True \
    --tf32 False \
    --flash_attn auto \
    --overwrite_output_dir \
    --train_on_prompt False \
    --max_samples 1000

OUTPUT_DIR="$(cd "${OUTPUT_DIR}" &>/dev/null && pwd)"
if [[ ! -f "${OUTPUT_DIR}/.gitignore" ]]; then
	echo '*' >"${OUTPUT_DIR}/.gitignore"
fi

报错信息为:

import peft                                                              
  File "/opt/conda/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module>   
    from .auto import (                                                      
  File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 32, in <module>           
    from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
  File "/opt/conda/lib/python3.10/site-packages/peft/mapping.py", line 22, in <module>
    from .mixed_model import PeftMixedModel
  File "/opt/conda/lib/python3.10/site-packages/peft/mixed_model.py", line 26, in <module>
    from peft.tuners.mixed import COMPATIBLE_TUNER_TYPES
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module>
    from .lora import LoraConfig, LoraModel, LoftQConfig
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora/__init__.py", line 20, in <module>
    from .model import LoraModel
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 49, in <module>
    from .awq import dispatch_awq
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora/awq.py", line 26, in <module>
    from awq.modules.linear import WQLinear_GEMM
  File "/opt/conda/lib/python3.10/site-packages/awq/__init__.py", line 2, in <module>
    from awq.models.auto import AutoAWQForCausalLM
  File "/opt/conda/lib/python3.10/site-packages/awq/models/__init__.py", line 1, in <module>
    from .mpt import MptAWQForCausalLM 
  File "/opt/conda/lib/python3.10/site-packages/awq/models/mpt.py", line 1, in <module>
    from .base import BaseAWQForCausalLM
  File "/opt/conda/lib/python3.10/site-packages/awq/models/base.py", line 46, in <module>
    from awq.quantize.quantizer import AwqQuantizer
  File "/opt/conda/lib/python3.10/site-packages/awq/quantize/quantizer.py", line 10, in <module>
    from awq.quantize.scale import apply_scale, apply_clip
  File "/opt/conda/lib/python3.10/site-packages/awq/quantize/scale.py", line 9, in <module>
    from transformers.models.gemma.modeling_gemma import GemmaRMSNorm
ModuleNotFoundError: No module named 'transformers.models.gemma'

Expected behavior

期望正常训练

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant