-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make Table.join() use the same logic as Table.rename() #10753
base: main
Are you sure you want to change the base?
Conversation
3e81ab2
to
35483fb
Compare
35483fb
to
4e4d811
Compare
There is this code inside expr/types/join.py: ```python def disambiguate_fields( how, predicates, equalities, left_fields, right_fields, left_template, right_template, ): """Resolve name collisions between the left and right tables.""" collisions = set() left_template = left_template or "{name}" right_template = right_template or "{name}" ... ``` This makes it so that "" is treated equivalently to "{name}". So this change doesn't have any behavior change, it just makes the defaults follow a more consistent pattern. The old patterns was a remnant from the lsuffix API we used to have.
before, any falsy values were interpreted as a noop. Now this errors.
4e4d811
to
433af80
Compare
This seems like a behavior change when chaining joins with overlapping column names. It's a bit concerning that no tests are failing, as this comes up in the TPC-DS query set quite a few times. Can you add a small test that fails before this change and then passes with your change so that we have a record of this behavior being run at least somewhere? Generally speaking, reusing renaming doesn't seem like a huge win to me, because while there may be commonality between the |
Sorry, can you be more verbose with an example? Did you see the commit description for |
There are a few different changes here, I can split them into separate PRs if you want. I think the first two refactor commits are strictly improvements, with no behavior change (I think), and feel pretty good about them. The second two are behavior changes and could use more discussion.
lname
param of the.join()
methods, from""
to"{name}"
. This by itself doesn't actually have any behavior change, it just is a lot more symmetrical with the other argument. This default is a remnant of when we used thelsuffix
andrsuffix
API..join()
. This increases the scope of .join(), so it can accept a wider range of inputs. But makes our API more consistent. I wonder if there are other places where we could take advantage of this util method?.join(..., lname="")
. Before. it would act the same as passing"{name}"
, now it errors. I doubt this affects many users, but probably affects some. I can adjust this in several ways (In order of my preference):util.rename()
so that resolving to falsy/None is an error. Users MUST return a string that they want as the result. This is the most breaking, but also I think the most sane. I already proposed this during the old .rename() refactor, but we decided against it.util.rename()
so that anything falsy is interpreted as a no-op. This is similar to how currentlyNone
is interpreted. This would not be breaking at all, but would increase the scope of both .join() and .rename()