Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8201 Fix DataFrame subsets indexing in CSVDataset() #8351

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

bartosz-grabowski
Copy link
Contributor

Fixes #8201 .

Description

convert_tables_to_dicts was using .loc to index DataFrames which was changed to .iloc. It was causing unexpected behavior in CSVDataset as demonstrated in #8201 because .loc expects labels, but was instead provided positions of the rows.
Unittest was added which fails before the change and passes afterwards.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@bartosz-grabowski bartosz-grabowski marked this pull request as ready for review February 18, 2025 23:41
@bartosz-grabowski bartosz-grabowski marked this pull request as draft February 18, 2025 23:58
Also updated the unittest to pass after the fix and not before.

Signed-off-by: Bartosz Grabowski <[email protected]>
@bartosz-grabowski bartosz-grabowski marked this pull request as ready for review February 19, 2025 00:35
@bartosz-grabowski
Copy link
Contributor Author

/black

@bartosz-grabowski
Copy link
Contributor Author

@KumoLiu @ericspod could you take a look at this?

@ericspod ericspod requested review from ericspod and KumoLiu March 3, 2025 17:31
Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go, we're waiting on a large PR to be merged, then we'll update this one and move forward.

@KumoLiu
Copy link
Contributor

KumoLiu commented Mar 4, 2025

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSVDataset behaves unexpectedly if src is a dataframe unexpected index
3 participants