Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical. #15

Bilibilee · 2024-03-14T12:31:57Z

in this Drive，Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical.

wusize · 2024-03-14T12:44:57Z

It's intentional. We need to make sure both methods used the same set of region proposals to fairly verify that self distillation is better than noisy region-text pairs. Kindly note that category ids were not used during CLIPSelf training even they were in the json.

kinredon · 2024-04-02T15:47:39Z

@wusize Hi, I am curious about how to obtain the region proposals and corresponding category ids when the model is only trained on the base categories? I found the category ids in coco_proposals.json are numerous.

wusize · 2024-04-02T18:51:08Z

@wusize Hi, I am curious about how to obtain the region proposals and corresponding category ids when the model is only trained on the base categories? I found the category ids in coco_proposals.json are numerous.

Hi! Please refer to A.4 in the appendix of the paper. You can also have a look at the data preparation of VLDet or RegionCLIP.

kinredon · 2024-04-03T05:19:01Z

@wusize Thanks for your quick reply. I have read Appendix A.4 In this paper and checked the data preparation of VLDet, but there is no information about how to generate the region proposals.

I would like to leverage the coco_proposals.json to improve my project, thus I need to understand how the coco_proposals.json are generated. Can you provide some information on how to obtain coco_proposals.json or where you downloaded it?

Great thanks again!

wusize · 2024-04-03T06:05:30Z

Train an PRN on base categories of COCO or obtain the RPN part of any off-the-shelf ov detector trained on coco.
Use the RPN to generate proposals.
Extract CLIP image embeddings for these proposals.
Parse each COCO caption into a group of nouns or phrases.
Extract CLIP text embeddings for these nouns/phrases.
Do bipartite matching between the image embeddings and text embeddings.

kinredon · 2024-04-03T06:47:13Z

I got it. Thanks!

kinredon · 2024-09-20T16:08:50Z

Train an PRN on base categories of COCO or obtain the RPN part of any off-the-shelf ov detector trained on coco.

Use the RPN to generate proposals.

Extract CLIP image embeddings for these proposals.

Parse each COCO caption into a group of nouns or phrases.

Extract CLIP text embeddings for these nouns/phrases.

Do bipartite matching between the image embeddings and text embeddings.

@wusize Hi, when I checked the generated proposals coco_pseudo_4764.json, I found there are many differences in the category ids between the coco_pseudo_4764.json and that in VLDet. For example:

The number of category ids is smaller than that in VLDet, so the last step (6) is not a simple bipartite matching. Do you have some filter operation? Hope you can give me some suggestions. Many thanks to you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical. #15

Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical. #15

Bilibilee commented Mar 14, 2024

wusize commented Mar 14, 2024 •

edited

Loading

kinredon commented Apr 2, 2024

wusize commented Apr 2, 2024

kinredon commented Apr 3, 2024

wusize commented Apr 3, 2024

kinredon commented Apr 3, 2024

kinredon commented Sep 20, 2024

Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical. #15

Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical. #15

Comments

Bilibilee commented Mar 14, 2024

wusize commented Mar 14, 2024 • edited Loading

kinredon commented Apr 2, 2024

wusize commented Apr 2, 2024

kinredon commented Apr 3, 2024

wusize commented Apr 3, 2024

kinredon commented Apr 3, 2024

kinredon commented Sep 20, 2024

wusize commented Mar 14, 2024 •

edited

Loading