Training script fails: Embedding indices is FloatTensor #14

AmitMY · 2025-02-15T09:14:01Z

Running the command:

multimodalhugs-train \
    --task "translation" \
    --model_name_or_path $MODEL_PATH \
    --processor_name_or_path $PROCESSOR_PATH \
    --run_name $MODEL_NAME \
    --dataset_dir $DATA_PATH \
    --output_dir $OUTPUT_PATH \
    --do_train True \
    --do_eval True \
    --fp16 \
    --label_smoothing_factor 0.1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --evaluation_strategy "steps" \
    --eval_steps 2000 \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 3 \
    --load_best_model_at_end true \
    --metric_for_best_model 'chrf' \
    --overwrite_output_dir \
    --gradient_accumulation_steps 4 \
    --learning_rate 1e-3 \
    --warmup_steps 20000 \
    --max_steps 200000 \
    --predict_with_generate True \
    --remove_unused_columns False

I get

RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)

Full Log

/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
  warnings.warn(
WARNING:multimodalhugs.tasks.run_translation:Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, 16-bits training: True
INFO:multimodalhugs.tasks.run_translation:Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=2000,
eval_strategy=IntervalStrategy.STEPS,
eval_use_gather_object=False,
evaluation_strategy=steps,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_num_input_tokens_seen=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.1,
learning_rate=0.001,
length_column_name=length,
load_best_model_at_end=True,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=/scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/output/runs/Feb15_10-12-41_u20-cva0ts0-509,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_kwargs={},
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=200000,
metric_for_best_model=chrf,
mp_parameters=,
neftune_noise_alpha=None,
no_cuda=False,
num_train_epochs=3.0,
optim=OptimizerNames.ADAMW_TORCH,
optim_args=None,
optim_target_modules=None,
output_dir=/scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/output,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=False,
report_to=['wandb'],
restore_callback_states_from_checkpoint=False,
resume_from_checkpoint=None,
run_name=signwriting_transcription_model,
save_on_each_node=False,
save_only_model=False,
save_safetensors=True,
save_steps=2000,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=3,
seed=42,
skip_memory_metrics=True,
sortish_sampler=False,
split_batches=None,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torch_empty_cache_steps=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=20000,
weight_decay=0.0,
)
[INFO|configuration_utils.py:731] 2025-02-15 10:12:42,267 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model/config.json
[INFO|configuration_utils.py:800] 2025-02-15 10:12:42,268 >> Model config MultiModalEmbedderConfig(model_type='multimodal_embedder', feat_dim=534, feature_extractor_type=None, no_scale_embedding=False, pretrained_feature_extractor=None, freeze_feature_extractor=False, vl_mapper_type='linear', vl_mapper_layer_norm_before=True, vl_mapper_layer_norm=False, vl_mapper_activation=False, vl_factor=None, vl_mapper_dropout=0.1, freeze_vl_mapper=False, new_embeddings_vocab_size=11, backbone_used_vocab_size=384, init_lang_abbr='avg', freeze_new_embeddings=False, freeze_old_embeddings=False, backbone_name='t5', backbone_cfg=None, pretrained_backbone='google/byt5-small', freeze_backbone=False, encoder_embed_dim=1472, feature_extractor_cfg=None, is_encoder_decoder=True, pad_token_id=0, bos_token_id=None, eos_token_id=1, max_length=20)
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:42,269 >> Generate config GenerationConfig {
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|processing_utils.py:660] 2025-02-15 10:12:42,269 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/pose2text_translation_processor/processor_config.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,272 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,272 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,273 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2267] 2025-02-15 10:12:42,273 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2513] 2025-02-15 10:12:42,274 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|processing_utils.py:660] 2025-02-15 10:12:42,274 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/pose2text_translation_processor/processor_config.json
[WARNING|processing_utils.py:953] 2025-02-15 10:12:42,274 >> Some kwargs in processor config are unused and will not have any effect: reduce_holistic_poses. 
[INFO|processing_utils.py:722] 2025-02-15 10:12:42,276 >> Processor Pose2TextTranslationProcessor:
- tokenizer: ByT5Tokenizer(name_or_path='/scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/pose2text_translation_processor', vocab_size=256, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'additional_special_tokens': ['<extra_id_0>', '<extra_id_1>', '<extra_id_2>', '<extra_id_3>', '<extra_id_4>', '<extra_id_5>', '<extra_id_6>', '<extra_id_7>', '<extra_id_8>', '<extra_id_9>', '<extra_id_10>', '<extra_id_11>', '<extra_id_12>', '<extra_id_13>', '<extra_id_14>', '<extra_id_15>', '<extra_id_16>', '<extra_id_17>', '<extra_id_18>', '<extra_id_19>', '<extra_id_20>', '<extra_id_21>', '<extra_id_22>', '<extra_id_23>', '<extra_id_24>', '<extra_id_25>', '<extra_id_26>', '<extra_id_27>', '<extra_id_28>', '<extra_id_29>', '<extra_id_30>', '<extra_id_31>', '<extra_id_32>', '<extra_id_33>', '<extra_id_34>', '<extra_id_35>', '<extra_id_36>', '<extra_id_37>', '<extra_id_38>', '<extra_id_39>', '<extra_id_40>', '<extra_id_41>', '<extra_id_42>', '<extra_id_43>', '<extra_id_44>', '<extra_id_45>', '<extra_id_46>', '<extra_id_47>', '<extra_id_48>', '<extra_id_49>', '<extra_id_50>', '<extra_id_51>', '<extra_id_52>', '<extra_id_53>', '<extra_id_54>', '<extra_id_55>', '<extra_id_56>', '<extra_id_57>', '<extra_id_58>', '<extra_id_59>', '<extra_id_60>', '<extra_id_61>', '<extra_id_62>', '<extra_id_63>', '<extra_id_64>', '<extra_id_65>', '<extra_id_66>', '<extra_id_67>', '<extra_id_68>', '<extra_id_69>', '<extra_id_70>', '<extra_id_71>', '<extra_id_72>', '<extra_id_73>', '<extra_id_74>', '<extra_id_75>', '<extra_id_76>', '<extra_id_77>', '<extra_id_78>', '<extra_id_79>', '<extra_id_80>', '<extra_id_81>', '<extra_id_82>', '<extra_id_83>', '<extra_id_84>', '<extra_id_85>', '<extra_id_86>', '<extra_id_87>', '<extra_id_88>', '<extra_id_89>', '<extra_id_90>', '<extra_id_91>', '<extra_id_92>', '<extra_id_93>', '<extra_id_94>', '<extra_id_95>', '<extra_id_96>', '<extra_id_97>', '<extra_id_98>', '<extra_id_99>', '<extra_id_100>', '<extra_id_101>', '<extra_id_102>', '<extra_id_103>', '<extra_id_104>', '<extra_id_105>', '<extra_id_106>', '<extra_id_107>', '<extra_id_108>', '<extra_id_109>', '<extra_id_110>', '<extra_id_111>', '<extra_id_112>', '<extra_id_113>', '<extra_id_114>', '<extra_id_115>', '<extra_id_116>', '<extra_id_117>', '<extra_id_118>', '<extra_id_119>', '<extra_id_120>', '<extra_id_121>', '<extra_id_122>', '<extra_id_123>', '<extra_id_124>', '__pose__', '__gsg__', '__slf__', '__asq__', '__ssr__', '__ase__', '__ils__', '__sgg__', '__cse__', '__svk__', '__dse__']}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
        0: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        1: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        2: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        259: AddedToken("<extra_id_0>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        260: AddedToken("<extra_id_1>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        261: AddedToken("<extra_id_2>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        262: AddedToken("<extra_id_3>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        263: AddedToken("<extra_id_4>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        264: AddedToken("<extra_id_5>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        265: AddedToken("<extra_id_6>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        266: AddedToken("<extra_id_7>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        267: AddedToken("<extra_id_8>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        268: AddedToken("<extra_id_9>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        269: AddedToken("<extra_id_10>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        270: AddedToken("<extra_id_11>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        271: AddedToken("<extra_id_12>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        272: AddedToken("<extra_id_13>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        273: AddedToken("<extra_id_14>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        274: AddedToken("<extra_id_15>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        275: AddedToken("<extra_id_16>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        276: AddedToken("<extra_id_17>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        277: AddedToken("<extra_id_18>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        278: AddedToken("<extra_id_19>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        279: AddedToken("<extra_id_20>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        280: AddedToken("<extra_id_21>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        281: AddedToken("<extra_id_22>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        282: AddedToken("<extra_id_23>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        283: AddedToken("<extra_id_24>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        284: AddedToken("<extra_id_25>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        285: AddedToken("<extra_id_26>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        286: AddedToken("<extra_id_27>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        287: AddedToken("<extra_id_28>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        288: AddedToken("<extra_id_29>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        289: AddedToken("<extra_id_30>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        290: AddedToken("<extra_id_31>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        291: AddedToken("<extra_id_32>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        292: AddedToken("<extra_id_33>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        293: AddedToken("<extra_id_34>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        294: AddedToken("<extra_id_35>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        295: AddedToken("<extra_id_36>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        296: AddedToken("<extra_id_37>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        297: AddedToken("<extra_id_38>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        298: AddedToken("<extra_id_39>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        299: AddedToken("<extra_id_40>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        300: AddedToken("<extra_id_41>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        301: AddedToken("<extra_id_42>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        302: AddedToken("<extra_id_43>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        303: AddedToken("<extra_id_44>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        304: AddedToken("<extra_id_45>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        305: AddedToken("<extra_id_46>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        306: AddedToken("<extra_id_47>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        307: AddedToken("<extra_id_48>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        308: AddedToken("<extra_id_49>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        309: AddedToken("<extra_id_50>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        310: AddedToken("<extra_id_51>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        311: AddedToken("<extra_id_52>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        312: AddedToken("<extra_id_53>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        313: AddedToken("<extra_id_54>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        314: AddedToken("<extra_id_55>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        315: AddedToken("<extra_id_56>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        316: AddedToken("<extra_id_57>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        317: AddedToken("<extra_id_58>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        318: AddedToken("<extra_id_59>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        319: AddedToken("<extra_id_60>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        320: AddedToken("<extra_id_61>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        321: AddedToken("<extra_id_62>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        322: AddedToken("<extra_id_63>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        323: AddedToken("<extra_id_64>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        324: AddedToken("<extra_id_65>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        325: AddedToken("<extra_id_66>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        326: AddedToken("<extra_id_67>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        327: AddedToken("<extra_id_68>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        328: AddedToken("<extra_id_69>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        329: AddedToken("<extra_id_70>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        330: AddedToken("<extra_id_71>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        331: AddedToken("<extra_id_72>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        332: AddedToken("<extra_id_73>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        333: AddedToken("<extra_id_74>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        334: AddedToken("<extra_id_75>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        335: AddedToken("<extra_id_76>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        336: AddedToken("<extra_id_77>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        337: AddedToken("<extra_id_78>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        338: AddedToken("<extra_id_79>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        339: AddedToken("<extra_id_80>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        340: AddedToken("<extra_id_81>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        341: AddedToken("<extra_id_82>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        342: AddedToken("<extra_id_83>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        343: AddedToken("<extra_id_84>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        344: AddedToken("<extra_id_85>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        345: AddedToken("<extra_id_86>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        346: AddedToken("<extra_id_87>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        347: AddedToken("<extra_id_88>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        348: AddedToken("<extra_id_89>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        349: AddedToken("<extra_id_90>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        350: AddedToken("<extra_id_91>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        351: AddedToken("<extra_id_92>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        352: AddedToken("<extra_id_93>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        353: AddedToken("<extra_id_94>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        354: AddedToken("<extra_id_95>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        355: AddedToken("<extra_id_96>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        356: AddedToken("<extra_id_97>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        357: AddedToken("<extra_id_98>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        358: AddedToken("<extra_id_99>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        359: AddedToken("<extra_id_100>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        360: AddedToken("<extra_id_101>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        361: AddedToken("<extra_id_102>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        362: AddedToken("<extra_id_103>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        363: AddedToken("<extra_id_104>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        364: AddedToken("<extra_id_105>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        365: AddedToken("<extra_id_106>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        366: AddedToken("<extra_id_107>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        367: AddedToken("<extra_id_108>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        368: AddedToken("<extra_id_109>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        369: AddedToken("<extra_id_110>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        370: AddedToken("<extra_id_111>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        371: AddedToken("<extra_id_112>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        372: AddedToken("<extra_id_113>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        373: AddedToken("<extra_id_114>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        374: AddedToken("<extra_id_115>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        375: AddedToken("<extra_id_116>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        376: AddedToken("<extra_id_117>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        377: AddedToken("<extra_id_118>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        378: AddedToken("<extra_id_119>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        379: AddedToken("<extra_id_120>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        380: AddedToken("<extra_id_121>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        381: AddedToken("<extra_id_122>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        382: AddedToken("<extra_id_123>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        383: AddedToken("<extra_id_124>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        384: AddedToken("__pose__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        385: AddedToken("__gsg__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        386: AddedToken("__slf__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        387: AddedToken("__asq__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        388: AddedToken("__ssr__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        389: AddedToken("__ase__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        390: AddedToken("__ils__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        391: AddedToken("__sgg__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        392: AddedToken("__cse__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        393: AddedToken("__svk__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        394: AddedToken("__dse__", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

{
  "processor_class": "Pose2TextTranslationProcessor",
  "reduce_holistic_poses": true
}

[INFO|modeling_utils.py:3675] 2025-02-15 10:12:42,334 >> loading weights file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model/model.safetensors
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:46,250 >> Generate config GenerationConfig {
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|configuration_utils.py:733] 2025-02-15 10:12:46,570 >> loading configuration file config.json from cache at /home/amoryo/data/.cache/huggingface/hub/models--google--byt5-small/snapshots/68377bdc18a2ffec8a0533fef03b1c513a4dd49d/config.json
[INFO|configuration_utils.py:800] 2025-02-15 10:12:46,571 >> Model config T5Config {
  "_name_or_path": "/home/patrick/t5/byt5-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 3584,
  "d_kv": 64,
  "d_model": 1472,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "gradient_checkpointing": false,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 4,
  "num_heads": 6,
  "num_layers": 12,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "tokenizer_class": "ByT5Tokenizer",
  "transformers_version": "4.44.2",
  "use_cache": true,
  "vocab_size": 384
}

[INFO|modeling_utils.py:3678] 2025-02-15 10:12:46,686 >> loading weights file pytorch_model.bin from cache at /home/amoryo/data/.cache/huggingface/hub/models--google--byt5-small/snapshots/68377bdc18a2ffec8a0533fef03b1c513a4dd49d/pytorch_model.bin
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:52,181 >> Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|modeling_utils.py:4507] 2025-02-15 10:12:52,229 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration.

[INFO|modeling_utils.py:4515] 2025-02-15 10:12:52,229 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at google/byt5-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:993] 2025-02-15 10:12:52,362 >> loading configuration file generation_config.json from cache at /home/amoryo/data/.cache/huggingface/hub/models--google--byt5-small/snapshots/68377bdc18a2ffec8a0533fef03b1c513a4dd49d/generation_config.json
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:52,362 >> Generate config GenerationConfig {
  "decoder_start_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0
}

[INFO|modeling_utils.py:4507] 2025-02-15 10:12:52,396 >> All model checkpoint weights were used when initializing MultiModalEmbedderModel.

[INFO|modeling_utils.py:4515] 2025-02-15 10:12:52,396 >> All the weights of MultiModalEmbedderModel were initialized from the model checkpoint at /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MultiModalEmbedderModel for predictions without further training.
[INFO|configuration_utils.py:991] 2025-02-15 10:12:52,414 >> loading configuration file /scratch/amoryo/tmp/signwriting-transcription/results/signwriting_transcription_model/trained_model/generation_config.json
[INFO|configuration_utils.py:1038] 2025-02-15 10:12:52,414 >> Generate config GenerationConfig {
  "eos_token_id": 1,
  "pad_token_id": 0
}

WARNING:multimodalhugs.tasks.run_translation:label_smoothing is enabled but the `prepare_decoder_input_ids_from_labels` method is not defined for `MultiModalEmbedderModel`. This will lead to loss being calculated twice and will take up more memory
train_dataset: Dataset({
    features: ['source', 'source_start', 'source_end', 'source_prompt', 'generation_prompt', 'output_text'],
    num_rows: 96404
})
[WARNING|trainer.py:598] 2025-02-15 10:12:56,477 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:648] 2025-02-15 10:12:56,477 >> Using auto half precision backend
INFO:multimodalhugs.tasks.run_translation:
MultiModalEmbedderModel(
  (vl_mapper): VLMapper(
    (layer_norm_before): LayerNorm((534,), eps=1e-05, elementwise_affine=True)
    (mapping_layer): Linear(in_features=534, out_features=1472, bias=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (special_tokens_embeddings): SpecialTokensEmbeddings(
    (special_tokens_embeddings): CustomEmbedding(
      (old_embeddings): Embedding(384, 1472)
      (new_embeddings): Embedding(11, 1472)
    )
  )
  (backbone): T5ForConditionalGeneration(
    (shared): Embedding(384, 1472)
    (encoder): T5Stack(
      (embed_tokens): Embedding(384, 1472)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
                (relative_attention_bias): Embedding(32, 6)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (1-11): 11 x T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (final_layer_norm): T5LayerNorm()
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (decoder): T5Stack(
      (embed_tokens): Embedding(384, 1472)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
                (relative_attention_bias): Embedding(32, 6)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerCrossAttention(
              (EncDecAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (2): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
        (1-3): 3 x T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerCrossAttention(
              (EncDecAttention): T5Attention(
                (q): Linear(in_features=1472, out_features=384, bias=False)
                (k): Linear(in_features=1472, out_features=384, bias=False)
                (v): Linear(in_features=1472, out_features=384, bias=False)
                (o): Linear(in_features=384, out_features=1472, bias=False)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (2): T5LayerFF(
              (DenseReluDense): T5DenseGatedActDense(
                (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
                (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
                (wo): Linear(in_features=3584, out_features=1472, bias=False)
                (dropout): Dropout(p=0.1, inplace=False)
                (act): NewGELUActivation()
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (final_layer_norm): T5LayerNorm()
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (lm_head): Linear(in_features=1472, out_features=384, bias=False)
  )
)

INFO:multimodalhugs.tasks.run_translation:
Model Summary:
+--------------------------------+-------------------+---------------------------+
| Module Name                    | N_parameters      | N_training_parameters     |
+--------------------------------+-------------------+---------------------------+
| vl_mapper                      |           788,588 |                   788,588 |
| special_tokens_embeddings      |           581,440 |                   581,440 |
| backbone                       |       299,072,512 |               299,072,512 |
+--------------------------------+-------------------+---------------------------+

checkpoint: None
[INFO|trainer.py:2134] 2025-02-15 10:12:56,641 >> ***** Running training *****
[INFO|trainer.py:2135] 2025-02-15 10:12:56,641 >>   Num examples = 96,404
[INFO|trainer.py:2136] 2025-02-15 10:12:56,641 >>   Num Epochs = 67
[INFO|trainer.py:2137] 2025-02-15 10:12:56,641 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2140] 2025-02-15 10:12:56,641 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:2141] 2025-02-15 10:12:56,641 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:2142] 2025-02-15 10:12:56,641 >>   Total optimization steps = 200,000
[INFO|trainer.py:2143] 2025-02-15 10:12:56,642 >>   Number of trainable parameters = 300,442,540
[INFO|integration_utils.py:807] 2025-02-15 10:12:56,643 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: amit_my to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.19.6
wandb: Run data is saved locally in /home/amoryo/sign-language/signwriting-transcription/wandb/run-20250215_101256-ogwdylf8
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run signwriting_transcription_model
wandb: ⭐️ View project at https://wandb.ai/amit_my/huggingface
wandb: 🚀 View run at https://wandb.ai/amit_my/huggingface/runs/ogwdylf8
  0%|                                                                                                                                                                                                   | 0/200000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/amoryo/conda/envs/multimodalhugs/bin/multimodalhugs-train", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/multimodalhugs_cli/train.py", line 25, in main
    translation_main()
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/tasks/run_translation.py", line 715, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 3318, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/transformers/trainer.py", line 3363, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/accelerate/utils/operations.py", line 819, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/accelerate/utils/operations.py", line 807, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/models/multimodal_embedder.py", line 524, in forward
    inputs_embeds, attention_mask =  self.special_tokens_embeddings(inputs_embeds, attention_mask, src_prompt, source_prompt_length_padding_mask)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/modules/special_tokens_embeddings.py", line 71, in forward
    src_prompt = self.special_tokens_embeddings(src_prompt)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/multimodalhugs/modules/custom_embedding.py", line 82, in forward
    old_embeds = self.old_embeddings(old_input_ids)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 190, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/multimodalhugs/lib/python3.11/site-packages/torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
wandb: 
wandb: 🚀 View run signwriting_transcription_model at: https://wandb.ai/amit_my/huggingface/runs/ogwdylf8
wandb: Find logs at: wandb/run-20250215_101256-ogwdylf8/logs

GerrySant added the bug Something isn't working label Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training script fails: Embedding indices is FloatTensor #14

Training script fails: Embedding indices is FloatTensor #14

AmitMY commented Feb 15, 2025

Training script fails: Embedding indices is FloatTensor #14

Training script fails: Embedding indices is FloatTensor #14

Comments

AmitMY commented Feb 15, 2025