Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical. #15

Open
Bilibilee opened this issue Mar 14, 2024 · 7 comments

Comments

@Bilibilee
Copy link

in this Drive,Is it intentional or a mistake that coco_proposals.json and coco_pseudo_4764.json are completely identical.

@wusize
Copy link
Owner

wusize commented Mar 14, 2024

It's intentional. We need to make sure both methods used the same set of region proposals to fairly verify that self distillation is better than noisy region-text pairs. Kindly note that category ids were not used during CLIPSelf training even they were in the json.

@kinredon
Copy link

kinredon commented Apr 2, 2024

@wusize Hi, I am curious about how to obtain the region proposals and corresponding category ids when the model is only trained on the base categories? I found the category ids in coco_proposals.json are numerous.

@wusize
Copy link
Owner

wusize commented Apr 2, 2024

@wusize Hi, I am curious about how to obtain the region proposals and corresponding category ids when the model is only trained on the base categories? I found the category ids in coco_proposals.json are numerous.

Hi! Please refer to A.4 in the appendix of the paper. You can also have a look at the data preparation of VLDet or RegionCLIP.

image

@kinredon
Copy link

kinredon commented Apr 3, 2024

@wusize Thanks for your quick reply. I have read Appendix A.4 In this paper and checked the data preparation of VLDet, but there is no information about how to generate the region proposals.

I would like to leverage the coco_proposals.json to improve my project, thus I need to understand how the coco_proposals.json are generated. Can you provide some information on how to obtain coco_proposals.json or where you downloaded it?

Great thanks again!

@wusize
Copy link
Owner

wusize commented Apr 3, 2024

  1. Train an PRN on base categories of COCO or obtain the RPN part of any off-the-shelf ov detector trained on coco.
  2. Use the RPN to generate proposals.
  3. Extract CLIP image embeddings for these proposals.
  4. Parse each COCO caption into a group of nouns or phrases.
  5. Extract CLIP text embeddings for these nouns/phrases.
  6. Do bipartite matching between the image embeddings and text embeddings.

@kinredon
Copy link

kinredon commented Apr 3, 2024

I got it. Thanks!

@kinredon
Copy link

  1. Train an PRN on base categories of COCO or obtain the RPN part of any off-the-shelf ov detector trained on coco.
  2. Use the RPN to generate proposals.
  3. Extract CLIP image embeddings for these proposals.
  4. Parse each COCO caption into a group of nouns or phrases.
  5. Extract CLIP text embeddings for these nouns/phrases.
  6. Do bipartite matching between the image embeddings and text embeddings.

@wusize Hi, when I checked the generated proposals coco_pseudo_4764.json, I found there are many differences in the category ids between the coco_pseudo_4764.json and that in VLDet. For example:

image

The number of category ids is smaller than that in VLDet, so the last step (6) is not a simple bipartite matching. Do you have some filter operation? Hope you can give me some suggestions. Many thanks to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants