Evaluation consistency questions #21

OasisArtisan · 2025-02-03T14:49:44Z

Thanks for the great work.

In table 1 in the paper I notices there are discrepancies between the numbers presented in concept graphs and concept fusion and your table. What would account for such discrepancies?

Your table:

Concept graphs table:

Some questions:

For replica, concept fusion and concept graphs remove the classes ("wall", "floor", "ceiling", "door", "window") do you also remove those classes in your evaluation?
For each scene in replica, what sequence do you use to perform your evaluation? Do you do a random walk yourself and generate the RGBD sequence? Do you use the NiceSlam processed sequence? Do you use the SemanticNerf processed RGBD sequence?
At what frame rate do you run your evaluation any frame skips ?

Thanks again

OasisArtisan · 2025-02-03T17:01:13Z

Discrepancy is also there for ScanNet between the numbers in HOV-SG and ConceptFusion

abwerby · 2025-02-09T18:50:33Z

Hello,
For Open-Set semantic segmentation evaluation on Replica and ScanNet datasets for ConceptFusion, we used their official online code according to their instructions and hyperparameters to generate the features map then we evaluated the results using the same evaluation criteria we applied for HOV-SG. We also noticed a performance gap compared to what is reported in the paper, you can check this issue Link. Therefore, we reported the results that we obtained using their code.
ConceptGraphs uses privileged information when evaluating: by screening whether a certain category is part of the ground truth of a scene they filter and suppress all predicted object categories that are not given in the scene-respective ground truth. This forces their method to select only categories that are present in the ground truth categories of the scene. Contrary to that, our evaluation protocol considers all categories present in the entire dataset.

Q1:
Yes, we also removed background classes you can check it here Link
Q2:
We used NiceSlam processed sequence.
Q3:
We typically run our pipeline with a skip frame of 10, you can find out the recommended hyperparameters here Link

abwerby closed this as completed Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation consistency questions #21

Evaluation consistency questions #21

OasisArtisan commented Feb 3, 2025

OasisArtisan commented Feb 3, 2025

abwerby commented Feb 9, 2025

Evaluation consistency questions #21

Evaluation consistency questions #21

Comments

OasisArtisan commented Feb 3, 2025

OasisArtisan commented Feb 3, 2025

abwerby commented Feb 9, 2025