Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

equal_rowcount pass the test when comparing 2 different tables of the same groupby columns #986

Open
1 of 5 tasks
light-le opened this issue Feb 19, 2025 · 0 comments · May be fixed by #987
Open
1 of 5 tasks

equal_rowcount pass the test when comparing 2 different tables of the same groupby columns #986

light-le opened this issue Feb 19, 2025 · 0 comments · May be fixed by #987
Labels
bug Something isn't working triage

Comments

@light-le
Copy link

Describe the bug

Using equal_rowcount between 2 different tables with the same groupby columns results in a false positive test

Steps to reproduce

  1. Create 2 different tables, with the same group by column. Let's call this gb. gb can have different values between the 2 tables (but same type). For example: 1 table could be (select 'x' as gb union all (select 'y' as gb)) and the other could be (select 'z' as gb)
  2. Apply equal_rowcount between the 2 table using gb as a group_by_columns
  3. Run the dbt data test

Expected results

This test should fail and users should be alerted

Actual results

This test passes without any problem

Screenshots and log output

Result of dbt_internal_test if using above example
Image

Because we have coalesce functions in the test evaluation, all null diff_count result as 0 failure, while it should be something else

select
      sum(coalesce(diff_count, 0)) as failures,
      sum(coalesce(diff_count, 0)) != 0 as should_warn,
      sum(coalesce(diff_count, 0)) != 0 as should_error
    from dbt_internal_test

System information

The contents of your packages.yml file:
packages:

  • package: dbt-labs/dbt_utils
    version: 1.3.0

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:
I'm using dbt cloud so I suppose it's versionless

Additional context

I think it's line 63 of equal_rowcount.sql. Instead of abs(count_a - count_b) as diff_count it should be abs(coalesce(count_a, 0) - coalesce(count_b, 0))

If this is applied, diff_count would no longer be null and coalesce() is no longer needed inside the sum() functions above. It could be moved to outside the sum() function like #974 which resolves #973

Are you interested in contributing the fix?

Yes I can contribute

@light-le light-le added bug Something isn't working triage labels Feb 19, 2025
@light-le light-le linked a pull request Feb 19, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant