You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Work in progress) I must correct the config, as it should be like:
backbone_name: "<backbone-model-type>"# Identifier for the pretrained backbone (e.g., "m2m100", "t5").pretrained_backbone: "<pretrained-backbone-weights>"# Weights or checkpoint identifier for the pretrained backbone. For instance "google/byt5-small"feat_dim: 534# Dimensionality of the features produced by the feature extractor if present, otherwise should be the dimensionality of the features that are inputed to the network.
For poses, lets say that each pose has the shape: [t, people, d, xyz], feat_dim = d*xyz*people
(Work in progress) Yeah, it should be better explained.
(Work in progress) The values respective to “training” in the example configuration have been chosen without any criteria. The intention is to use the values obtained after training some models with a decent performance.
Also, it is possible that there is a bug in the prioritization of the training arguments between those specified in the config and those specified in the training command, so for the moment it is recommended to specify the training hyperparameters during the multimodalhugs-train command.
Yes. The new tokens are used to extend the pretrained tokenizer, but the extended tokenizer is only used to create the new embeddings for the encoder (the new embeddings extend the embeddings from the pretrained backbone embeddings).
(Work in progress) Regarding this behaviour excluding generation prompt, fixing it has now been identified as a priority.
(Work in progress) Finally, it is planned to rename the tokenizer_src_langs_path parameter to new_vocabulary in the near future to better reflect the way it works.
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L12-L14
It is not super clear to me what are valid values. Are these?
feat_dim
- maybe explain in the comment that it is the pose, for example 178 points with 3 dimensionsI also think this should be optional:
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L19
Is this actually float16 or bfloat16?
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L48
Learning rate seems too high
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L31
Does this only modify the src tokenizer?
https://github.com/GerrySant/multimodalhugs/blob/master/examples/multimodal_translation/pose2text_translation/configs/example_config.yaml#L56
The name suggests so, but you do allow
generation_prompt
which is in the output tokenizerThe text was updated successfully, but these errors were encountered: