cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

betterftr · 2024-11-04T15:17:22Z

(trained with ft-B-train-OpenAI-CLIP-ViT-L-14 then used ft-C-convert-for-SDXL-comfyUI-OpenAI-CLIP and then tried to convert to HF and extract the TE, I am trying to copy for sd3.5L tenc1)

convert_clip_original_pytorch_to_hf.py", line 157, in
convert_clip_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path)
File "C:\OneTrainer\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\OneTrainer\CLIP-fine-tune\Convert-for-HuggingFace-Spaces-etc\convert_clip_original_pytorch_to_hf.py", line 121, in convert_clip_checkpoint
pt_model, _ = load(checkpoint_path, device="cpu", jit=False)
File "C:\OneTrainer\venv\lib\site-packages\clip\clip.py", line 136, in load
state_dict = torch.load(opened_file, map_location="cpu")
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1384, in load
return _legacy_load(
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1628, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

as title says, + also extract TE outputs 1kb file:

update:
after messing around I managed to do the conversion like this, now it is loadable with sd35

import torch
from transformers import CLIPTextModelWithProjection, CLIPTextConfig

# Load the fine-tuned model and extract the state_dict
full_model = torch.load("C:/OneTrainer/CLIP-fine-tune/ft-checkpoints/my-finetune.pt")
state_dict = full_model.state_dict() if hasattr(full_model, "state_dict") else full_model

# Load the configuration and create the model
config = CLIPTextConfig.from_pretrained("C:/train/sd3.5/text_encoder/config.json")
fine_tuned_model = CLIPTextModelWithProjection(config)

# Load the state_dict into the fine-tuned model
fine_tuned_model.load_state_dict(state_dict, strict=False)

# Save only the text encoder part
fine_tuned_model.save_pretrained("C:/OneTrainer/CLIP-fine-tune/ft-checkpoints/")

interestingly the converted, extracted text encoder works with stable diffusion 3.5 (CLIPTextModelWithProjection) but not with flux (changing to CLIPTextModel)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

betterftr commented Nov 4, 2024 •

edited

Loading

cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

Comments

betterftr commented Nov 4, 2024 • edited Loading

betterftr commented Nov 4, 2024 •

edited

Loading