Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix input_needed_for_output to not return one less than needed, causing draining down the line #811

Merged
merged 19 commits into from
Mar 3, 2025

Conversation

padenot
Copy link
Collaborator

@padenot padenot commented Jan 23, 2025

The calculation was incorrect. This meant that in some circumstances,
e.g. playing a 44.1Hz stream on a 96kHz interface, with a block of 441
frames on Windows, rounding was eventually returning 440 instead of 441
in the callback, leading to the stream entering the draining phase.

@padenot
Copy link
Collaborator Author

padenot commented Jan 23, 2025

This fixes BMO#1940319.

Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that github interprets review of a commit as review of the entire pull request.

Thank you for testing thoroughly.
I expect that this function should be able to be simple without special cases. The special cases make me suspect that something else may not be quite right.

Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent several hours trying to prove that a certain calculation would produce enough input for an output, but I have not yet come up with something concrete.

The resampler advance can be understood from the code in resampler_basic_zero() (or any of the other resampler_basic_funcs).

The resampler can be considered to count at num_rate * den_rate quantums per
second, where den_rate = out_rate / GCF(in_rate, out_rate) and num_rate = in_rate / GCF(in_rate, out_rate).

samp_frac_num counts in these quantums.
Each last_sample unit represents den_rate quantums.
Adding the two together gives a position.

Each resampler iteration advances by num_rate quantums (1/den_rate seconds), and produces one output frame.

Initially the first input and output frames correspond to 0 quantums.

Each speex_resample() call leaves the resampler at a position exactly aligned with an output frame, so output_frame_count output frames requires an advance of output_frame_count * num_rate quantums.

Input frames intervals are den_rate quantums, but the first input frame is aligned differently on different speex_resample() calls, and there is an additional complicating factor that the speex resampler will sometimes consume input frames that it doesn't actually use.

last_sample values correspond to input frame positions, but last_sample can get ahead of the number of input frames consumed. In the last iteration of the resampler_basic_func, last_sample can advance beyond *in_len. The input frames stepped over are not needed yet. Input frames are consumed up to last_sample, even the frames that weren't needed.

The consuming of some unused input frames aligns better with the expected ratio of input to output frames given the alignment at 0 quantums. When last_sample advances beyond *in_len, if those extra frames are not yet available, they are pulled on the next speex_resample() call, pulling more frames than expected.

@padenot padenot force-pushed the resampler-draining branch from 8f7e192 to 1b52d0f Compare January 28, 2025 17:40
@padenot
Copy link
Collaborator Author

padenot commented Jan 28, 2025

I'll probably add some test that checks that the output is sensible, iirc we have an utility for that somewhere (feed a sine wave at a certain freq, check that we get that at the output).

@padenot padenot force-pushed the resampler-draining branch from 1b52d0f to 3b1bcae Compare January 28, 2025 17:51
Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@padenot padenot force-pushed the resampler-draining branch 3 times, most recently from 56d90bd to 8ea49e1 Compare February 4, 2025 13:50
Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The large tolerance implies that something is not right either in the continuity test or in what it is testing.

Would you like to drop that commit from the PR and deal with that separately, to get the draining fix landed?
Or do you think the continuity test is important to check that the drain fix is not breaking continuity?

Comment on lines 1425 to 1426
// `data`. This isn't stricly necessary but helps having cleaner
// dumps for manual inspection in e.g. Audacity. In all our test cases
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MSE would be much higher without this, because the initial silence is very different from a sine wave, so I assume this stripping is necessary.

Comment on lines 1456 to 1458
// 1.5 is a good higher bound found empirically. Any mistake, e.g.
// discontinuity make the MSE shoot up dramatically.
ASSERT_LT(mse, 1.5);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.5 seems a high tolerance for a mean squared error between samples in the range [-0.8, 0.8]. Comparing 0.8 * sin() with zero would have a mean squared error of about 0.32, if my maths is correct. I would expect the MSE for the resampler output to be a small fraction of 1 if everything is behaving as intended.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large error in a single sample is greatly attenuated by a mean square error.
A maximum error might be better for our purposes, and easier to reason about a tolerance.

Comment on lines 1214 to 1225
for (size_t i = 0; i < len; ++i) {
float c = std::cos(phase_incr * static_cast<float>(i));
float s = std::sin(phase_incr * static_cast<float>(i));
sum_cos += signal[i] * c;
sum_sin += signal[i] * s;
}

float amplitude = 2.0f * std::sqrt(sum_cos * sum_cos + sum_sin * sum_sin) /
static_cast<float>(len);
float phi = std::atan2(sum_sin, sum_cos);
Copy link
Collaborator

@karlt karlt Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may have discarded my copy of Numerical Methods for Engineers.
This is a fancy simplification of what was presented in the other references.
I understand dropping of the column of 1s from D_0 because we expect zero DC offset.

(D_0^T D_0)^-1 has been replaced with 2/n * I.
Do you have a reference for that?
I suspect that would not hold exactly if the signal duration is not an integer multiple of the period (or perhaps half period?). The skipping of leading samples means that the duration is not an integer multiple, so this formulation would be an approximation for duration >> period.
I suspect this may be related to "When frequencies are chosen precisely as recommended for coherent sampling, with an exact integer number of cycles in a record, a DFT will give the same results as a three-parameter sine wave fit."

If such an approximation is good enough, then some explanation would be helpful.
Doing the full calculation of D_0^T D_0 would not be a lot more code.

Copy link
Collaborator

@karlt karlt Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how significant the error here is, but the phase should constant so it is not necessary to perform the fit over the entire resampled duration.
That means that the duration of the fit can be selected so as to improve the accuracy of this 2/n * I approximation. The residuals can still be calculated over most of the resampled output (excluding the leading frames affected by zeros).

We want an integer number of 1/1000 s periods. If a single period were selected, then at 44.1kHz, it would be covered by 44.1 samples. We want an integer number of samples, so multiply the period by 10. That makes the 2/n * I approximation as accurate as it can be because multiplying the period further just duplicates the same contributions to D_0^T D_0. I doubt larger periods would increase the accuracy of D_0^T x either, if the resampler is behaving as expected, because x should be periodic in the same way.

This is equivalent to a couple of the terms of a DFT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that formulation would then be exact because the off-diagnal sin.cos terms should cancel out to be exactly zeros, and the components of the diagonal terms would be evenly balanced either side of 0.5.

Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll hold off on looking at the threading until the test seems to be passing.

Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reassuring about the behavior of cubeb.
Once the identified issues in the test are resolved, the maximum MSE is 0.001920.
With more aggression stripping leading zeros, and perhaps also initial samples containing oscillations from the slope discontinuity, the maximum error might be even less.

@karlt
Copy link
Collaborator

karlt commented Feb 14, 2025

With more aggression stripping leading zeros, and perhaps also initial samples containing oscillations from the slope discontinuity, the maximum error might be even less.

Because the resampler is a symmetric FIR filter, skipping twice the number of zero samples before the sine starts would skip the errors due to the slope discontinuity and cover over inaccuracies in finding the first sine sample. I guess there's a choice between testing for precise residuals in most of the resampler output or accepting less precise residuals but testing the early samples in the sine.

…ng draining down the line

The calculation was incorrect. This meant that in some circumstances,
e.g. playing a 44.1Hz stream on a 96kHz interface, with a block of 441
frames on Windows, rounding was eventually returning 440 instead of 441
in the callback, leading to the stream entering the draining phase.
…ck sizes and sample-rates

This might be a bit much, runtime is 25s debug ASAN, 6s opt, but should
be thorough.
Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At CUBEB_RESAMPLER_QUALITY_DEFAULT, these changes reduce maximum MSE to
MSE=0.000119, effective_block=2048, resampling_ratio=2.756250, input-block=5644.800000, skipped=32

I haven't looked into why the start of the sine is not found for _VOIP.

Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would reduce the remaining MSEs to < 1e-6.
If you don't want to do this, then please explain the approximation and when it would be exact.

@padenot padenot force-pushed the resampler-draining branch 2 times, most recently from 5493e8f to 4cb6597 Compare February 28, 2025 16:27
Copy link
Collaborator

@karlt karlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks

@padenot padenot merged commit 34ff385 into master Mar 3, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants