Lacking clarity of parameters #7

AmitMY · 2025-02-14T10:01:48Z

What is this being used for?
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L12-L14

It is not super clear to me what are valid values. Are these?

  backbone_name: "google/byt5-small"            
  pretrained_backbone: "google/byt5-small"

And for feat_dim - maybe explain in the comment that it is the pose, for example 178 points with 3 dimensions

I also think this should be optional:
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L19

Is this actually float16 or bfloat16?
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L48
Learning rate seems too high
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L31
Does this only modify the src tokenizer?
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L56
The name suggests so, but you do allow generation_prompt which is in the output tokenizer

The text was updated successfully, but these errors were encountered:

GerrySant · 2025-02-14T12:32:39Z

(Work in progress) I must correct the config, as it should be like:

  backbone_name: "<backbone-model-type>"            # Identifier for the pretrained backbone (e.g., "m2m100", "t5").
  pretrained_backbone: "<pretrained-backbone-weights>"    # Weights or checkpoint identifier for the pretrained backbone. For instance "google/byt5-small"
  feat_dim: 534                                           # Dimensionality of the features produced by the feature extractor if present, otherwise should be the dimensionality of the features that are inputed to the network.

For poses, lets say that each pose has the shape: [t, people, d, xyz], feat_dim = d*xyz*people

(Work in progress) Yeah, it should be better explained.
It uses the float position adopted by the Trainer
(Work in progress) The values respective to “training” in the example configuration have been chosen without any criteria. The intention is to use the values obtained after training some models with a decent performance.
Also, it is possible that there is a bug in the prioritization of the training arguments between those specified in the config and those specified in the training command, so for the moment it is recommended to specify the training hyperparameters during the multimodalhugs-train command.
Yes. The new tokens are used to extend the pretrained tokenizer, but the extended tokenizer is only used to create the new embeddings for the encoder (the new embeddings extend the embeddings from the pretrained backbone embeddings).
(Work in progress) Regarding this behaviour excluding generation prompt, fixing it has now been identified as a priority.
(Work in progress) Finally, it is planned to rename the tokenizer_src_langs_path parameter to new_vocabulary in the near future to better reflect the way it works.

GerrySant added bug Something isn't working documentation Improvements or additions to documentation labels Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lacking clarity of parameters #7

Lacking clarity of parameters #7

AmitMY commented Feb 14, 2025

GerrySant commented Feb 14, 2025

Lacking clarity of parameters #7

Lacking clarity of parameters #7

Comments

AmitMY commented Feb 14, 2025

GerrySant commented Feb 14, 2025