Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds metadata to FlyteFile #3160

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Feb 27, 2025

Tracking issue

Closes flyteorg/flyte#6257

Why are the changes needed?

With this PR, metacata can be attached to a FlyteFile.

What changes were proposed in this pull request?

This PR adds metadata to FlyteFile so that it an adjust the Literal's metadata.

How was this patch tested?

Unit test and integration tests were updated.

The UI automatically shows the new metadata:

Screenshot 2025-02-27 at 10 16 32 AM

Summary by Bito

This PR has two main components: 1) Adds metadata support to FlyteFile with serialization/deserialization handling and type validation, enabling storage of string key-value pairs and preserving metadata during workflow execution. 2) Implements modular agent-related dependencies by moving them to an optional dependency group, with updates to Dockerfile.agent and dependency management through pyproject.toml.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

Signed-off-by: Thomas J. Fan <[email protected]>
Signed-off-by: Thomas J. Fan <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 27, 2025

Code Review Agent Run #6d466d

Actionable Suggestions - 9
  • flytekit/types/file/file.py - 5
    • Consider metadata validation before serialization · Line 164-164
    • Consider validating metadata before serialization · Line 173-173
    • Consider adding metadata dictionary validation · Line 301-301
    • Consider validating metadata dictionary type · Line 622-622
    • Consider validating metadata dictionary content · Line 720-720
  • tests/flytekit/unit/types/file/test_file.py - 1
    • Consider adding file parameter validation · Line 89-90
  • tests/flytekit/integration/remote/workflows/basic/flytefile.py - 3
    • Consider metadata validation before access · Line 23-26
    • Consider metadata existence check before access · Line 23-24
    • Consider making info parameter consistently optional · Line 50-52
Additional Suggestions - 1
  • tests/flytekit/unit/types/file/test_file.py - 1
    • Consider simplifying metadata retrieval logic · Line 117-121
Review Details
  • Files reviewed - 4 · Commit Range: eadbe93..b77f618
    • flytekit/types/file/file.py
    • tests/flytekit/integration/remote/workflows/basic/flytefile.py
    • tests/flytekit/unit/types/file/test_file.py
    • tests/flytekit/unit/utils/test_pbhash.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 74.67%. Comparing base (2e12f43) to head (fe0f604).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/types/file/file.py 80.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3160      +/-   ##
==========================================
- Coverage   83.58%   74.67%   -8.92%     
==========================================
  Files           3      212     +209     
  Lines         195    22040   +21845     
  Branches        0     2866    +2866     
==========================================
+ Hits          163    16458   +16295     
- Misses         32     4778    +4746     
- Partials        0      804     +804     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 27, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - FlyteFile Metadata Support

file.py - Added metadata field to FlyteFile with serialization/deserialization support

flytefile.py - Added integration tests for FlyteFile metadata functionality

test_file.py - Added unit tests for FlyteFile metadata features

Signed-off-by: Thomas J. Fan <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 27, 2025

Code Review Agent Run #7a5220

Actionable Suggestions - 1
  • flytekit/types/file/file.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: b77f618..fe0f604
    • flytekit/types/file/file.py
    • tests/flytekit/unit/utils/test_pbhash.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 27, 2025

Code Review Agent Run #a08b41

Actionable Suggestions - 0
Review Details
  • Files reviewed - 3 · Commit Range: fe0f604..a2c6a6b
    • Dockerfile.agent
    • flytekit/clis/sdk_in_container/serve.py
    • pyproject.toml
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Copy link
Collaborator

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth reserving certain fields (e.g. file size)?

@thomasjpfan
Copy link
Member Author

Do you think it's worth reserving certain fields (e.g. file size)?

I do not think we need to reserve them. If we have file size, we can add it by default, but then a user can override it.

I'm on the fence about automatically adding metadata, because there is no way for Flyte to make sure it's in sync. With user provided metadata, it is the user's responsible to make sure the reference does not change underneath them.

@eapolinario
Copy link
Collaborator

I'm on the fence about automatically adding metadata, because there is no way for Flyte to make sure it's in sync.

fsspec has info in its API, which returns file size (amongst other things?), so it's not too crazy that we could think expose a way to verify the file size and compare with what's in the metadata? I'm not saying we make this the default, but for the really paranoid I can see a use of that (although if we're really going that route I'm sure one could rely on adding metadata directly to the objects in blob store, e.g. in s3 - not sure what's the level of support in fsspec).

tbh, I'd be ok with adding metadata by default (controlled by an env var) and an explanation in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core feature] Add metadata to FlyteFile
3 participants