Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add missing_ok: bool param to Table.drop() #10816

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 48 additions & 5 deletions ibis/expr/types/relations.py
Original file line number Diff line number Diff line change
Expand Up @@ -2337,13 +2337,15 @@

return ops.Project(self, exprs).to_expr()

def drop(self, *fields: str | Selector) -> Table:
def drop(self, *fields: str | Selector, missing_ok: bool = False) -> Table:
"""Remove fields from a table.

Parameters
----------
fields
Fields to drop. Strings and selectors are accepted.
missing_ok
If True, do not raise an error if a given field does not exist.

Returns
-------
Expand Down Expand Up @@ -2401,6 +2403,28 @@
│ Torgersen │ 19.3 │ 193 │ 3450 │ female │ … │
└───────────┴───────────────┴───────────────────┴─────────────┴────────┴───┘

If a given column does not exist, then by default an error is raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If a given column does not exist, then by default an error is raised.
If a column does not exist, then by default an error is raised.

If you want to ignore missing columns, you can pass `missing_ok=True`.
This is useful if you need a column to not exist,
but you don't know if it does or not.
Comment on lines +2425 to +2426
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is useful if you need a column to not exist,
but you don't know if it does or not.

I don't this this bit of explanation is necessary.


>>> t.drop("column_that_must_not_exist_afterwards") # doctest: +SKIP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left out the specific error because I am worried that is going to change and become stale.
I would only want to show the actual error type if we remove the doctest: +SKIP marker, but I thought the extra boilerplate to actually be able to catch the error was not worth it. Open to adjusting this though.

Traceback (most recent call last):
...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
...
...

>>> t.drop("column_that_must_not_exist_afterwards", missing_ok=True).head()
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━┓
┃ species ┃ island ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ … ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━┩
│ string │ string │ float64 │ float64 │ int64 │ … │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼───┤
│ Adelie │ Torgersen │ 39.1 │ 18.7 │ 181 │ … │
│ Adelie │ Torgersen │ 39.5 │ 17.4 │ 186 │ … │
│ Adelie │ Torgersen │ 40.3 │ 18.0 │ 195 │ … │
│ Adelie │ Torgersen │ NULL │ NULL │ NULL │ … │
│ Adelie │ Torgersen │ 36.7 │ 19.3 │ 193 │ … │
│ … │ … │ … │ … │ … │ … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴───┘

Drop with selectors, mix and match

>>> import ibis.selectors as s
Expand All @@ -2417,12 +2441,31 @@
│ Torgersen │ 193 │ 3450 │ female │ 2007 │
└───────────┴───────────────────┴─────────────┴────────┴───────┘
"""
if not fields:

def _get_cols(field) -> list[str]:
try:
cols = self._fast_bind(field)
except com.IbisTypeError as e:
if "is not found in table" in str(e):
if missing_ok:
return []
raise
except AttributeError as e:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to catch these two different kinds of exceptions here, and them being not very specific, I think really demonstrates the need for #10412

if "has no attribute" in str(e):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for substrings in error messages is extremely fragile. The implementation of this functionality should basically never depend on the specific content an informational error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, binding for every column is very expensive for very wide tables.

Instead, what we should do is the following:

  1. Introduce FieldNotFoundError as you say
  2. Allow bind (perhaps through an option to _fast_bind?) to return an optimistic (possibly) subset: basically whatever columns are found from fields.

if missing_ok:
return []
raise

Check warning on line 2457 in ibis/expr/types/relations.py

View check run for this annotation

Codecov / codecov/patch

ibis/expr/types/relations.py#L2457

Added line #L2457 was not covered by tests
return [Expr.get_name(c) for c in cols]

to_drop = []
for field in fields:
to_drop.extend(_get_cols(field))
if not to_drop:
# no-op if nothing to be dropped
return self

columns_to_drop = frozenset(map(Expr.get_name, self._fast_bind(*fields)))
return ops.DropColumns(parent=self, columns_to_drop=columns_to_drop).to_expr()
return ops.DropColumns(
parent=self, columns_to_drop=frozenset(to_drop)
).to_expr()

def filter(
self,
Expand Down
9 changes: 9 additions & 0 deletions ibis/tests/expr/test_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -1794,9 +1794,18 @@ def test_drop():
res = t.drop(_.a, "b")
assert res.schema() == t.select("c", "d").schema()

res = t.drop(s.cols("a", "b"), "d")
assert res.schema() == t.select("c").schema()

with pytest.raises(com.IbisTypeError):
t.drop("e")

res = t.drop("e", missing_ok=True)
assert res.schema() == t.schema()

res = t.drop("a", _.e, missing_ok=True)
assert res.schema() == t.select("b", "c", "d").schema()


def test_drop_equality():
t = ibis.table(dict.fromkeys("abcd", "int"))
Expand Down
Loading