Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training script fails: Embedding indices is FloatTensor #14

Open
AmitMY opened this issue Feb 15, 2025 · 0 comments
Open

Training script fails: Embedding indices is FloatTensor #14

AmitMY opened this issue Feb 15, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@AmitMY
Copy link
Contributor

AmitMY commented Feb 15, 2025

Running the command:

multimodalhugs-train \
    --task "translation" \
    --model_name_or_path $MODEL_PATH \
    --processor_name_or_path $PROCESSOR_PATH \
    --run_name $MODEL_NAME \
    --dataset_dir $DATA_PATH \
    --output_dir $OUTPUT_PATH \
    --do_train True \
    --do_eval True \
    --fp16 \
    --label_smoothing_factor 0.1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --evaluation_strategy "steps" \
    --eval_steps 2000 \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 3 \
    --load_best_model_at_end true \
    --metric_for_best_model 'chrf' \
    --overwrite_output_dir \
    --gradient_accumulation_steps 4 \
    --learning_rate 1e-3 \
    --warmup_steps 20000 \
    --max_steps 200000 \
    --predict_with_generate True \
    --remove_unused_columns False

I get

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

Full Log
/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
WARNING:multimodalhugs.tasks.run_translation:Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, 16-bits training: True
INFO:multimodalhugs.tasks.run_translation:Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=2000,
eval_strategy=IntervalStrategy.STEPS,
eval_use_gather_object=False,
evaluation_strategy=steps,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.1,
learning_rate=0.001,
length_column_name=length,
load_best_model_at_end=True,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/output/runs/Feb15_10-12-41_u20-cva0ts0-509,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=200000,
metric_for_best_model=chrf,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=/scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/output,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=signwriting_transcription_model,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=2000,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=3,
seed=42,
skip_memory_metrics=True,
sortish_sampler=False,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=20000,
weight_decay=0.0,
)
[INFO|configuration_utils.py:731] 2025-02-15 10:12:42,267 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model/config.json
[INFO|configuration_utils.py:800] 2025-02-15 10:12:42,268 >> Model config MultiModalEmbedderConfig(model_type='multimodal_embedder', feat_dim=534, feature_extractor_type=None, no_scale_embedding=False, pretrained_feature_extractor=None, freeze_feature_extractor=False, vl_mapper_type='linear', vl_mapper_layer_norm_before=True, vl_mapper_layer_norm=False, vl_mapper_activation=False, vl_factor=None, vl_mapper_dropout=0.1, freeze_vl_mapper=False, new_embeddings_vocab_size=11, backbone_used_vocab_size=384, init_lang_abbr='avg', freeze_new_embeddings=False, freeze_old_embeddings=False, backbone_name='t5', backbone_cfg=None, pretrained_backbone='google/byt5-small', freeze_backbone=False, encoder_embed_dim=1472, feature_extractor_cfg=None, is_encoder_decoder=True, pad_token_id=0, bos_token_id=None, eos_token_id=1, max_length=20)
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:42,269 >> Generate config GenerationConfig {
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|processing_utils.py:660] 2025-02-15 10:12:42,269 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/pose2text_translation_processor/processor_config.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,272 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,272 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,273 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,273 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2513] 2025-02-15 10:12:42,274 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|processing_utils.py:660] 2025-02-15 10:12:42,274 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/pose2text_translation_processor/processor_config.json
[WARNING|processing_utils.py:953] 2025-02-15 10:12:42,274 >> Some kwargs in processor config are unused and will not have any effect: reduce_holistic_poses. 
[INFO|processing_utils.py:722] 2025-02-15 10:12:42,276 >> Processor Pose2TextTranslationProcessor:
- tokenizer: ByT5Tokenizer(name_or_path='/scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/pose2text_translation_processor', vocab_size=256, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'additional_special_tokens': ['<extra_id_0>', '<extra_id_1>', '<extra_id_2>', '<extra_id_3>', '<extra_id_4>', '<extra_id_5>', '<extra_id_6>', '<extra_id_7>', '<extra_id_8>', '<extra_id_9>', '<extra_id_10>', '<extra_id_11>', '<extra_id_12>', '<extra_id_13>', '<extra_id_14>', '<extra_id_15>', '<extra_id_16>', '<extra_id_17>', '<extra_id_18>', '<extra_id_19>', '<extra_id_20>', '<extra_id_21>', '<extra_id_22>', '<extra_id_23>', '<extra_id_24>', '<extra_id_25>', '<extra_id_26>', '<extra_id_27>', '<extra_id_28>', '<extra_id_29>', '<extra_id_30>', '<extra_id_31>', '<extra_id_32>', '<extra_id_33>', '<extra_id_34>', '<extra_id_35>', '<extra_id_36>', '<extra_id_37>', '<extra_id_38>', '<extra_id_39>', '<extra_id_40>', '<extra_id_41>', '<extra_id_42>', '<extra_id_43>', '<extra_id_44>', '<extra_id_45>', '<extra_id_46>', '<extra_id_47>', '<extra_id_48>', '<extra_id_49>', '<extra_id_50>', '<extra_id_51>', '<extra_id_52>', '<extra_id_53>', '<extra_id_54>', '<extra_id_55>', '<extra_id_56>', '<extra_id_57>', '<extra_id_58>', '<extra_id_59>', '<extra_id_60>', '<extra_id_61>', '<extra_id_62>', '<extra_id_63>', '<extra_id_64>', '<extra_id_65>', '<extra_id_66>', '<extra_id_67>', '<extra_id_68>', '<extra_id_69>', '<extra_id_70>', '<extra_id_71>', '<extra_id_72>', '<extra_id_73>', '<extra_id_74>', '<extra_id_75>', '<extra_id_76>', '<extra_id_77>', '<extra_id_78>', '<extra_id_79>', '<extra_id_80>', '<extra_id_81>', '<extra_id_82>', '<extra_id_83>', '<extra_id_84>', '<extra_id_85>', '<extra_id_86>', '<extra_id_87>', '<extra_id_88>', '<extra_id_89>', '<extra_id_90>', '<extra_id_91>', '<extra_id_92>', '<extra_id_93>', '<extra_id_94>', '<extra_id_95>', '<extra_id_96>', '<extra_id_97>', '<extra_id_98>', '<extra_id_99>', '<extra_id_100>', '<extra_id_101>', '<extra_id_102>', '<extra_id_103>', '<extra_id_104>', '<extra_id_105>', '<extra_id_106>', '<extra_id_107>', '<extra_id_108>', '<extra_id_109>', '<extra_id_110>', '<extra_id_111>', '<extra_id_112>', '<extra_id_113>', '<extra_id_114>', '<extra_id_115>', '<extra_id_116>', '<extra_id_117>', '<extra_id_118>', '<extra_id_119>', '<extra_id_120>', '<extra_id_121>', '<extra_id_122>', '<extra_id_123>', '<extra_id_124>', '__pose__', '__gsg__', '__slf__', '__asq__', '__ssr__', '__ase__', '__ils__', '__sgg__', '__cse__', '__svk__', '__dse__']}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
        0: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        1: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        2: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        259: AddedToken("<extra_id_0>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        260: AddedToken("<extra_id_1>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        261: AddedToken("<extra_id_2>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        262: AddedToken("<extra_id_3>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        263: AddedToken("<extra_id_4>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        264: AddedToken("<extra_id_5>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        265: AddedToken("<extra_id_6>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        266: AddedToken("<extra_id_7>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        267: AddedToken("<extra_id_8>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        268: AddedToken("<extra_id_9>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        269: AddedToken("<extra_id_10>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        270: AddedToken("<extra_id_11>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        271: AddedToken("<extra_id_12>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        272: AddedToken("<extra_id_13>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        273: AddedToken("<extra_id_14>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        274: AddedToken("<extra_id_15>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        275: AddedToken("<extra_id_16>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        276: AddedToken("<extra_id_17>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        277: AddedToken("<extra_id_18>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        278: AddedToken("<extra_id_19>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        279: AddedToken("<extra_id_20>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        280: AddedToken("<extra_id_21>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        281: AddedToken("<extra_id_22>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        282: AddedToken("<extra_id_23>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        283: AddedToken("<extra_id_24>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        284: AddedToken("<extra_id_25>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        285: AddedToken("<extra_id_26>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        286: AddedToken("<extra_id_27>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        287: AddedToken("<extra_id_28>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        288: AddedToken("<extra_id_29>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        289: AddedToken("<extra_id_30>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        290: AddedToken("<extra_id_31>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        291: AddedToken("<extra_id_32>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        292: AddedToken("<extra_id_33>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        293: AddedToken("<extra_id_34>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        294: AddedToken("<extra_id_35>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        295: AddedToken("<extra_id_36>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        296: AddedToken("<extra_id_37>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        297: AddedToken("<extra_id_38>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        298: AddedToken("<extra_id_39>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        299: AddedToken("<extra_id_40>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        300: AddedToken("<extra_id_41>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        301: AddedToken("<extra_id_42>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        302: AddedToken("<extra_id_43>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        303: AddedToken("<extra_id_44>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        304: AddedToken("<extra_id_45>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        305: AddedToken("<extra_id_46>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        306: AddedToken("<extra_id_47>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        307: AddedToken("<extra_id_48>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        308: AddedToken("<extra_id_49>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        309: AddedToken("<extra_id_50>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        310: AddedToken("<extra_id_51>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        311: AddedToken("<extra_id_52>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        312: AddedToken("<extra_id_53>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        313: AddedToken("<extra_id_54>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        314: AddedToken("<extra_id_55>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        315: AddedToken("<extra_id_56>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        316: AddedToken("<extra_id_57>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        317: AddedToken("<extra_id_58>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        318: AddedToken("<extra_id_59>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        319: AddedToken("<extra_id_60>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        320: AddedToken("<extra_id_61>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        321: AddedToken("<extra_id_62>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        322: AddedToken("<extra_id_63>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        323: AddedToken("<extra_id_64>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        324: AddedToken("<extra_id_65>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        325: AddedToken("<extra_id_66>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        326: AddedToken("<extra_id_67>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        327: AddedToken("<extra_id_68>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        328: AddedToken("<extra_id_69>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        329: AddedToken("<extra_id_70>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        330: AddedToken("<extra_id_71>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        331: AddedToken("<extra_id_72>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        332: AddedToken("<extra_id_73>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        333: AddedToken("<extra_id_74>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        334: AddedToken("<extra_id_75>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        335: AddedToken("<extra_id_76>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        336: AddedToken("<extra_id_77>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        337: AddedToken("<extra_id_78>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        338: AddedToken("<extra_id_79>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        339: AddedToken("<extra_id_80>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        340: AddedToken("<extra_id_81>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        341: AddedToken("<extra_id_82>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        342: AddedToken("<extra_id_83>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        343: AddedToken("<extra_id_84>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        344: AddedToken("<extra_id_85>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        345: AddedToken("<extra_id_86>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        346: AddedToken("<extra_id_87>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        347: AddedToken("<extra_id_88>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        348: AddedToken("<extra_id_89>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        349: AddedToken("<extra_id_90>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        350: AddedToken("<extra_id_91>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        351: AddedToken("<extra_id_92>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        352: AddedToken("<extra_id_93>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        353: AddedToken("<extra_id_94>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        354: AddedToken("<extra_id_95>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        355: AddedToken("<extra_id_96>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        356: AddedToken("<extra_id_97>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        357: AddedToken("<extra_id_98>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        358: AddedToken("<extra_id_99>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        359: AddedToken("<extra_id_100>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        360: AddedToken("<extra_id_101>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        361: AddedToken("<extra_id_102>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        362: AddedToken("<extra_id_103>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        363: AddedToken("<extra_id_104>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        364: AddedToken("<extra_id_105>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        365: AddedToken("<extra_id_106>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        366: AddedToken("<extra_id_107>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        367: AddedToken("<extra_id_108>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        368: AddedToken("<extra_id_109>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        369: AddedToken("<extra_id_110>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        370: AddedToken("<extra_id_111>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        371: AddedToken("<extra_id_112>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        372: AddedToken("<extra_id_113>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        373: AddedToken("<extra_id_114>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        374: AddedToken("<extra_id_115>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        375: AddedToken("<extra_id_116>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        376: AddedToken("<extra_id_117>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        377: AddedToken("<extra_id_118>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        378: AddedToken("<extra_id_119>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        379: AddedToken("<extra_id_120>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        380: AddedToken("<extra_id_121>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        381: AddedToken("<extra_id_122>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        382: AddedToken("<extra_id_123>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        383: AddedToken("<extra_id_124>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        384: AddedToken("__pose__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        385: AddedToken("__gsg__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        386: AddedToken("__slf__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        387: AddedToken("__asq__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        388: AddedToken("__ssr__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        389: AddedToken("__ase__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        390: AddedToken("__ils__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        391: AddedToken("__sgg__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        392: AddedToken("__cse__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        393: AddedToken("__svk__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        394: AddedToken("__dse__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

{
  "processor_class": "Pose2TextTranslationProcessor",
  "reduce_holistic_poses": true
}

[INFO|modeling_utils.py:3675] 2025-02-15 10:12:42,334 >> loading weights file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model/model.safetensors
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:46,250 >> Generate config GenerationConfig {
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|configuration_utils.py:733] 2025-02-15 10:12:46,570 >> loading configuration file config.json from cache at /home/amoryo/data/.cache/huggingface/hub/models--google--byt5-small/snapshots/68377bdc18a2ffec8a0533fef03b1c513a4dd49d/config.json
[INFO|configuration_utils.py:800] 2025-02-15 10:12:46,571 >> Model config T5Config {
  "_name_or_path": "/home/patrick/t5/byt5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 3584,
  "d_kv": 64,
  "d_model": 1472,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "gradient_checkpointing": false,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 4,
  "num_heads": 6,
  "num_layers": 12,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "ByT5Tokenizer",
  "transformers_version": "4.44.2",
  "use_cache": true,
  "vocab_size": 384
}

[INFO|modeling_utils.py:3678] 2025-02-15 10:12:46,686 >> loading weights file pytorch_model.bin from cache at /home/amoryo/data/.cache/huggingface/hub/models--google--byt5-small/snapshots/68377bdc18a2ffec8a0533fef03b1c513a4dd49d/pytorch_model.bin
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:52,181 >> Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|modeling_utils.py:4507] 2025-02-15 10:12:52,229 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration.

[INFO|modeling_utils.py:4515] 2025-02-15 10:12:52,229 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at google/byt5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:993] 2025-02-15 10:12:52,362 >> loading configuration file generation_config.json from cache at /home/amoryo/data/.cache/huggingface/hub/models--google--byt5-small/snapshots/68377bdc18a2ffec8a0533fef03b1c513a4dd49d/generation_config.json
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:52,362 >> Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|modeling_utils.py:4507] 2025-02-15 10:12:52,396 >> All model checkpoint weights were used when initializing MultiModalEmbedderModel.

[INFO|modeling_utils.py:4515] 2025-02-15 10:12:52,396 >> All the weights of MultiModalEmbedderModel were initialized from the model checkpoint at /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MultiModalEmbedderModel for predictions without further training.
[INFO|configuration_utils.py:991] 2025-02-15 10:12:52,414 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model/generation_config.json
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:52,414 >> Generate config GenerationConfig {
  "eos_token_id": 1,
  "pad_token_id": 0
}

WARNING:multimodalhugs.tasks.run_translation:label_smoothing is enabled but the `prepare_decoder_input_ids_from_labels` method is not defined for `MultiModalEmbedderModel`. This will lead to loss being calculated twice and will take up more memory
train_dataset: Dataset({
    features: ['source', 'source_start', 'source_end', 'source_prompt', 'generation_prompt', 'output_text'],
    num_rows: 96404
})
[WARNING|trainer.py:598] 2025-02-15 10:12:56,477 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:648] 2025-02-15 10:12:56,477 >> Using auto half precision backend
INFO:multimodalhugs.tasks.run_translation:
MultiModalEmbedderModel(
  (vl_mapper): VLMapper(
    (layer_norm_before): LayerNorm((534,), eps=1e-05, elementwise_affine=True)
    (mapping_layer): Linear(in_features=534, out_features=1472, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (special_tokens_embeddings): SpecialTokensEmbeddings(
    (special_tokens_embeddings): CustomEmbedding(
      (old_embeddings): Embedding(384, 1472)
      (new_embeddings): Embedding(11, 1472)
    )
  )
  (backbone): T5ForConditionalGeneration(
    (shared): Embedding(384, 1472)
    (encoder): T5Stack(
      (embed_tokens): Embedding(384, 1472)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
                (relative_attention_bias): Embedding(32, 6)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (1-11): 11 x T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (final_layer_norm): T5LayerNorm()
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (decoder): T5Stack(
      (embed_tokens): Embedding(384, 1472)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
                (relative_attention_bias): Embedding(32, 6)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerCrossAttention(
              (EncDecAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (2): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (1-3): 3 x T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerCrossAttention(
              (EncDecAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (2): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (final_layer_norm): T5LayerNorm()
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (lm_head): Linear(in_features=1472, out_features=384, bias=False)
  )
)

INFO:multimodalhugs.tasks.run_translation:
Model Summary:
+--------------------------------+-------------------+---------------------------+
| Module Name                    | N_parameters      | N_training_parameters     |
+--------------------------------+-------------------+---------------------------+
| vl_mapper                      |           788,588 |                   788,588 |
| special_tokens_embeddings      |           581,440 |                   581,440 |
| backbone                       |       299,072,512 |               299,072,512 |
+--------------------------------+-------------------+---------------------------+

checkpoint: None
[INFO|trainer.py:2134] 2025-02-15 10:12:56,641 >> ***** Running training *****
[INFO|trainer.py:2135] 2025-02-15 10:12:56,641 >>   Num examples = 96,404
[INFO|trainer.py:2136] 2025-02-15 10:12:56,641 >>   Num Epochs = 67
[INFO|trainer.py:2137] 2025-02-15 10:12:56,641 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2140] 2025-02-15 10:12:56,641 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2141] 2025-02-15 10:12:56,641 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:2142] 2025-02-15 10:12:56,641 >>   Total optimization steps = 200,000
[INFO|trainer.py:2143] 2025-02-15 10:12:56,642 >>   Number of trainable parameters = 300,442,540
[INFO|integration_utils.py:807] 2025-02-15 10:12:56,643 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: amit_my to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.19.6
wandb: Run data is saved locally in /home/amoryo/sign-language/signwriting-transcription/wandb/run-20250215_101256-ogwdylf8
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run signwriting_transcription_model
wandb: ⭐️ View project at https://wandb.ai/amit_my/huggingface
wandb: 🚀 View run at https://wandb.ai/amit_my/huggingface/runs/ogwdylf8
  0%|                                                                                                                                                                                                   | 0/200000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/amoryo/conda/envs/multimodalhugs/bin/multimodalhugs-train", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/multimodalhugs_cli/train.py", line 25, in main
    translation_main()
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/tasks/run_translation.py", line 715, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 3318, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 3363, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/accelerate/utils/operations.py", line 819, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/accelerate/utils/operations.py", line 807, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/models/multimodal_embedder.py", line 524, in forward
    inputs_embeds, attention_mask =  self.special_tokens_embeddings(inputs_embeds, attention_mask, src_prompt, source_prompt_length_padding_mask)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/modules/special_tokens_embeddings.py", line 71, in forward
    src_prompt = self.special_tokens_embeddings(src_prompt)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/modules/custom_embedding.py", line 82, in forward
    old_embeds = self.old_embeddings(old_input_ids)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 190, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
wandb: 
wandb: 🚀 View run signwriting_transcription_model at: https://wandb.ai/amit_my/huggingface/runs/ogwdylf8
wandb: Find logs at: wandb/run-20250215_101256-ogwdylf8/logs
@GerrySant GerrySant added the bug Something isn't working label Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants