Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation consistency questions #21

Closed
OasisArtisan opened this issue Feb 3, 2025 · 2 comments
Closed

Evaluation consistency questions #21

OasisArtisan opened this issue Feb 3, 2025 · 2 comments

Comments

@OasisArtisan
Copy link

Thanks for the great work.

In table 1 in the paper I notices there are discrepancies between the numbers presented in concept graphs and concept fusion and your table. What would account for such discrepancies?

Your table:
Image

Concept graphs table:
Image

Some questions:

  1. For replica, concept fusion and concept graphs remove the classes ("wall", "floor", "ceiling", "door", "window") do you also remove those classes in your evaluation?
  2. For each scene in replica, what sequence do you use to perform your evaluation? Do you do a random walk yourself and generate the RGBD sequence? Do you use the NiceSlam processed sequence? Do you use the SemanticNerf processed RGBD sequence?
  3. At what frame rate do you run your evaluation any frame skips ?

Thanks again

@OasisArtisan
Copy link
Author

Discrepancy is also there for ScanNet between the numbers in HOV-SG and ConceptFusion

Image

@abwerby
Copy link
Collaborator

abwerby commented Feb 9, 2025

Hello,
For Open-Set semantic segmentation evaluation on Replica and ScanNet datasets for ConceptFusion, we used their official online code according to their instructions and hyperparameters to generate the features map then we evaluated the results using the same evaluation criteria we applied for HOV-SG. We also noticed a performance gap compared to what is reported in the paper, you can check this issue Link. Therefore, we reported the results that we obtained using their code.
ConceptGraphs uses privileged information when evaluating: by screening whether a certain category is part of the ground truth of a scene they filter and suppress all predicted object categories that are not given in the scene-respective ground truth. This forces their method to select only categories that are present in the ground truth categories of the scene. Contrary to that, our evaluation protocol considers all categories present in the entire dataset.

Q1:
Yes, we also removed background classes you can check it here Link
Q2:
We used NiceSlam processed sequence.
Q3:
We typically run our pipeline with a skip frame of 10, you can find out the recommended hyperparameters here Link

@abwerby abwerby closed this as completed Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants