Future Hand Prediction: Is the mask multiplied by the prediction we submit? #19

masashi-hatano · 2022-09-08T07:59:59Z

I tried submitting a json file, which follows the specified format, and I obtained the quantitative result as follows.

{"L_MDisp": 211.9670281732144, "R_MDisp": 276.70152097706693, "L_CDisp": 207.87675657595398, "R_CDisp": 271.26115030262525, "Total": 967.8064560288606}

However, even though the results that we tested in the validation dataset were better than the baseline, the results obtained from the actual submissions have a huge amount of errors.
This is probably because the mask is not multiplied by the prediction we submit. The mask is used so that the error is zero on frames in which hand is not visible.

To demonstrate that the quantitative results presented above are anomalous, here are a prediction list, which is a part of my submission.json file, and its visualization result.

As you can see these figures, the quantitative results obtained from the actual submission seem to be incorrect, and the reason for this is thought to be that the loss is calculated without multiplying the predictions by the masks.

@VJWQ
Could you please confirm that the loss calculation is done correctly? In particular, I would appreciate it if you could check if the process is done to set the error to zero if the hands are not in frames.

"2152_3837": [120.74557495117188, 84.73670959472656, 235.1125030517578, 93.4263687133789, 118.04257202148438, 86.06185150146484, 230.081787109375, 91.89846801757812, 125.53624725341797, 88.14488220214844, 230.46359252929688, 94.43958282470703, 122.34292602539062, 88.79545593261719, 225.5665740966797, 91.5564193725586, 122.0747299194336, 94.3060531616211, 217.82423400878906, 99.28343963623047]

Figure1 pre_45 frame

Figure2 pre_30 frame

Figure3 pre_15 frame

Figure4 pre_frame

Figure5 contact_frame

The text was updated successfully, but these errors were encountered:

VJWQ · 2022-09-08T22:31:39Z

hi @masashi-hatano happy to have you as our participant!
Things run as expected on my side, for your information our baseline also gives 1*20 non-zero prediction results for sample "2152_3837", which means it should be fine to have non-zero predictions on frames without hands. As we mentioned on the challenge page, "Our evaluation script won't penalize your algorithm if it gives predictions on frames without hands."
I suggest you revisit our sample evaluation code to understand how does our metrics work. Specifically, you'll see how we filter out the out-of-frame hand situation in L80 to make sure it won't influence the submission. Please also make sure to use our script generate_submission.py to generate the submission file, and don't forget to take care of the num_clips=30 argument.
Feel free to ask if you still got blocked!! Happy to help :)

masashi-hatano · 2022-09-09T02:12:24Z

@VJWQ
Thanks for your reply!
I solved this problem by using num_clips=30, and evaluation was done correctly.

But, I don't really understand why num_clips is needed. According to the sample evaluation code, num_clips is used just for dividing the predicted values. I would appreciate if you could give me some explanation about it.

VJWQ · 2022-09-09T03:57:30Z

@VJWQ Thanks for your reply! I solved this problem by using num_clips=30, and evaluation was done correctly.

But, I don't really understand why num_clips is needed. According to the sample evaluation code, num_clips is used just for dividing the predicted values. I would appreciate if you could give me some explanation about it.

Sure, please refer to the explanation. the number 30 is obtained from the line cfg.TEST.NUM_ENSEMBLE_VIEWS * cfg.TEST.NUM_SPATIAL_CROPS, which is an operation for better testing the robustness of the model. In short, we need to /30 when generating the submission file to obtain the average performance on each test clip.

takfate · 2022-09-11T13:55:46Z

@VJWQ
In generate_submission.py.
num_clips seems not to be used.

takfate · 2022-09-12T08:42:35Z

@masashi-hatano
@VJWQ
Hello, I also have some questions.
I want to know what is the loss for validation when you training the baseline code.
I append the code

    for key in pred_dict:
        pred_dict[key] = pred_dict[key] / num_clips

after the multi-view accumulation.
But I also get the eval results similar to the results in your first comment.
I doubt if there is a problem with my data sampling.

masashi-hatano · 2022-09-12T09:01:41Z

@takfate
Hello, It probably helps you.
If you evaluate your model by using generate_submission.py and eval.py, the predicted value will be divided by num_clips twice, so it may solve your problem if you remove either of them.

takfate · 2022-09-12T09:29:07Z

@masashi-hatano
I use generate_submission.py to generate a submission file for the test set and submit it to EvalAI evaluation system.
Will the EvalAI evaluation system do another division by 30?

VJWQ · 2022-09-12T18:13:59Z

@masashi-hatano I use generate_submission.py to generate a submission file for the test set and submit it to EvalAI evaluation system. Will the EvalAI evaluation system do another division by 30?

hi @takfate, your results will not be divided twice. In generate_submission.py, num_clips is just a placeholder and does not really do division on your results. This file serves to sum all 30 prediction results for one clip, and the actual division happens in our evaluation script after you submit your results json file in which /30 helps to obtain the average results for each clip. However, you still need to run python tools/generate_submission.py /path/to/output.pkl 30 to generate the submission file correctly.
@masashi-hatano @takfate
Do you mind posting the commands you use to generate the submission file? I can have a look at them to see why you are receiving similar results and adjust our guidance accordingly.

masashi-hatano · 2022-09-13T02:33:30Z

If so, it's fine for me, thanks though.

takfate · 2022-09-13T05:40:30Z

@VJWQ @masashi-hatano
Our eval results are already normal. Thank you for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future Hand Prediction: Is the mask multiplied by the prediction we submit? #19

Future Hand Prediction: Is the mask multiplied by the prediction we submit? #19

masashi-hatano commented Sep 8, 2022 •

edited

Loading

VJWQ commented Sep 8, 2022

masashi-hatano commented Sep 9, 2022 •

edited

Loading

VJWQ commented Sep 9, 2022

takfate commented Sep 11, 2022 •

edited

Loading

takfate commented Sep 12, 2022

masashi-hatano commented Sep 12, 2022 •

edited

Loading

takfate commented Sep 12, 2022

VJWQ commented Sep 12, 2022 •

edited

Loading

masashi-hatano commented Sep 13, 2022

takfate commented Sep 13, 2022 •

edited

Loading

Future Hand Prediction: Is the mask multiplied by the prediction we submit? #19

Future Hand Prediction: Is the mask multiplied by the prediction we submit? #19

Comments

masashi-hatano commented Sep 8, 2022 • edited Loading

VJWQ commented Sep 8, 2022

masashi-hatano commented Sep 9, 2022 • edited Loading

VJWQ commented Sep 9, 2022

takfate commented Sep 11, 2022 • edited Loading

takfate commented Sep 12, 2022

masashi-hatano commented Sep 12, 2022 • edited Loading

takfate commented Sep 12, 2022

VJWQ commented Sep 12, 2022 • edited Loading

masashi-hatano commented Sep 13, 2022

takfate commented Sep 13, 2022 • edited Loading

masashi-hatano commented Sep 8, 2022 •

edited

Loading

masashi-hatano commented Sep 9, 2022 •

edited

Loading

takfate commented Sep 11, 2022 •

edited

Loading

masashi-hatano commented Sep 12, 2022 •

edited

Loading

VJWQ commented Sep 12, 2022 •

edited

Loading

takfate commented Sep 13, 2022 •

edited

Loading