Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

Open
betterftr opened this issue Nov 4, 2024 · 0 comments
Open

cant convert to hf with convert_clip_original_pytorch_to_hf.py #17

betterftr opened this issue Nov 4, 2024 · 0 comments

Comments

@betterftr
Copy link

betterftr commented Nov 4, 2024

(trained with ft-B-train-OpenAI-CLIP-ViT-L-14 then used ft-C-convert-for-SDXL-comfyUI-OpenAI-CLIP and then tried to convert to HF and extract the TE, I am trying to copy for sd3.5L tenc1)

convert_clip_original_pytorch_to_hf.py", line 157, in
convert_clip_checkpoint(args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path)
File "C:\OneTrainer\venv\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\OneTrainer\CLIP-fine-tune\Convert-for-HuggingFace-Spaces-etc\convert_clip_original_pytorch_to_hf.py", line 121, in convert_clip_checkpoint
pt_model, _ = load(checkpoint_path, device="cpu", jit=False)
File "C:\OneTrainer\venv\lib\site-packages\clip\clip.py", line 136, in load
state_dict = torch.load(opened_file, map_location="cpu")
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1384, in load
return _legacy_load(
File "C:\OneTrainer\venv\lib\site-packages\torch\serialization.py", line 1628, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

as title says, + also extract TE outputs 1kb file:
image

update:
after messing around I managed to do the conversion like this, now it is loadable with sd35

import torch
from transformers import CLIPTextModelWithProjection, CLIPTextConfig

# Load the fine-tuned model and extract the state_dict
full_model = torch.load("C:/OneTrainer/CLIP-fine-tune/ft-checkpoints/my-finetune.pt")
state_dict = full_model.state_dict() if hasattr(full_model, "state_dict") else full_model

# Load the configuration and create the model
config = CLIPTextConfig.from_pretrained("C:/train/sd3.5/text_encoder/config.json")
fine_tuned_model = CLIPTextModelWithProjection(config)

# Load the state_dict into the fine-tuned model
fine_tuned_model.load_state_dict(state_dict, strict=False)

# Save only the text encoder part
fine_tuned_model.save_pretrained("C:/OneTrainer/CLIP-fine-tune/ft-checkpoints/")

interestingly the converted, extracted text encoder works with stable diffusion 3.5 (CLIPTextModelWithProjection) but not with flux (changing to CLIPTextModel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant