Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Seed Dataset to improve compatibility and simplify usage #1734

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

apokryphosx
Copy link
Collaborator

@apokryphosx apokryphosx commented Mar 7, 2025

Description

This PR solves:
#1732
#1733

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and poetry.lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

initialized from HF/Pytorch/JSON/list of Dicts,
remove the need for setup call and subsequently
cleanup
@apokryphosx apokryphosx linked an issue Mar 7, 2025 that may be closed by this pull request
2 tasks
@apokryphosx apokryphosx requested a review from hallerite March 7, 2025 13:52
@hallerite hallerite added the P0 Task with high level priority label Mar 7, 2025
@hallerite hallerite added this to the Sprint 24 milestone Mar 7, 2025
@hallerite hallerite added enhancement New feature or request Refactor labels Mar 7, 2025
Copy link
Collaborator

@hallerite hallerite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apokryphosx, I left some comments.

instead of strings and add seed for reproducibility
between simply skipping invalid datapoints in a
seed dataset and throwing an exception
seed dataset to ensure they are defined before the
other functions are
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P0 Task with high level priority Refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Seed Dataset has compatibility issues Seed Dataset has unexpected behavior
2 participants