Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreground Scale for Any V2 #255

Open
math-artist opened this issue Nov 21, 2024 · 8 comments
Open

Foreground Scale for Any V2 #255

math-artist opened this issue Nov 21, 2024 · 8 comments
Labels

Comments

@math-artist
Copy link

First, thank you for this super useful app. I have only started using iw3 yesterday, and I already have over one hundred files converted (small files, many are samples for testing)

During my testing, I found annoying that the model Any_V2 seemed truncated when setting the foreground scale to 1. So, I have plot the curves to see what happens.

image

I use Any V2 in another project, and the depth maps close to 0 are the furthest, and the higher ones are closer. So, what foreground scale is doing, is it is flattening the background for a very small gain in the slope near 1. And that's exactly what I am seeing when I am using it. I think it's implemented backward.

I wrote this code derived from a function you made that has, I think, the correct transform for Any V2.

def inv_softplus01_edited(x, bias, scale):
    min_v = ((torch.zeros(1, dtype=x.dtype, device=x.device) - bias) * scale).expm1().clamp(min=1e-6).log()
    max_v = ((torch.ones(1, dtype=x.dtype, device=x.device) - bias) * scale).expm1().clamp(min=1e-6).log()
    v = ((1 - x - bias) * scale).expm1().clamp(min=1e-6).log()
    return 1 - (v - min_v) / (max_v - min_v)

image

@nagadomi
Copy link
Owner

Thanks for the info.
I too thought the current conversion curve for Depth-Anything was not good, but since I don't use it myself, I left it alone for a long time.
The current expression is just a smooth function of x > 0.5 ? (x - 0.5) * 2 : 0 as you say.
I will try to organize knowledge of that area at this time.

@nagadomi nagadomi added the iw3 label Nov 22, 2024
@andy500
Copy link

andy500 commented Dec 19, 2024

where do l put this code in thank you

@math-artist
Copy link
Author

math-artist commented Dec 19, 2024

This one is tricky because it need to be implemented at different places.

The code above was just an example, but it's a replacer for softplus01 in mapper.py

So, in mapper.py, comment the old function and put this:

def softplus01(x, bias, scale):
    min_v = ((torch.zeros(1, dtype=x.dtype, device=x.device) - bias) * scale).expm1().clamp(min=1e-6).log()
    max_v = ((torch.ones(1, dtype=x.dtype, device=x.device) - bias) * scale).expm1().clamp(min=1e-6).log()
    v = ((1 - x - bias) * scale).expm1().clamp(min=1e-6).log()
    return 1 - (v - min_v) / (max_v - min_v)

In mapper.py again, inside the function resolve_mapper_function, you have to replace the bias and scale values. look for the condition elif name in {"mul_1", "mul_2", "mul_3"}: and replace param:

    elif name in {"mul_1", "mul_2", "mul_3"}:
        param = {
            # none 1x
            "mul_1": {"bias": -0.15, "scale": 4},  # smooth 1.5x
            "mul_2": {"bias": -0.08, "scale": 5},  # smooth 2x
            "mul_3": {"bias": -0.04, "scale": 6},  # smooth 3x
        }[name]

This is what I use currently after running some tests, but it could be improved. It's better than the former function because the background was flattened too much.

However, I am running the app most of the time with foreground scale = 0, and even sometimes -1, but I crank up the 3D strength instead. I didn't find the flatter image on foreground with Anything Depth V2, but I know that the metric and the original model behave quite differently. I don't use the metric model.

Regarding my method of using negative foreground scale and higher 3D strength:

The problem with using high divergence is always caused by high pixel displacement, which always happen more on closer objects. This is how 3D vision work. Distant objects will not cause artefacts. Using a negative value until the closest image has the proper depth to compensate for high 3D strength can allow you to have the proper depth for popping objects, while also having deeper and far away backgrounds.

@andy500
Copy link

andy500 commented Dec 19, 2024

thank you will try this

@francdn
Copy link

francdn commented Dec 20, 2024

@math-artist What value do you use for 3D strength?

@andy500
Copy link

andy500 commented Dec 20, 2024

1.5 and l use row_flow_v3_sym works very good

@francdn
Copy link

francdn commented Dec 20, 2024

I was asking @math-artist since he said "I crank up the 3D strength instead".

@Reno-CZ
Copy link

Reno-CZ commented Jan 22, 2025

row_flow_v3_sym - i dont have this, why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants