Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

namespace files not present after successful upload #6687

Open
MarthaScheffler opened this issue Jan 8, 2025 · 3 comments
Open

namespace files not present after successful upload #6687

MarthaScheffler opened this issue Jan 8, 2025 · 3 comments
Labels
area/backend Needs backend code changes area/frontend Needs frontend code changes bug Something isn't working

Comments

@MarthaScheffler
Copy link

MarthaScheffler commented Jan 8, 2025

Describe the issue

Issue
When uploading many files into a namespace, some of them might not be available, even though the upload has succeeded.

see Slack https://kestra-io.slack.com/archives/C03FQKXRK3K/p1719913391618119

"workaround"
delete files/file structure in namespace prior to re-play

Reproduce
to reproduce, use the following 3 flows. can be adjusted, if the files should be deleted afterwards or not (task disabled). if a subflow has failed, try to rerun (with and without deleting other files in the namespace before). run with different number of concurrceny.

Flow1 (trigger subflows):


id: trigger_flow
namespace: kestra.namespace_test
tasks:
  - id: parallel
    type: io.kestra.plugin.core.flow.Parallel
    concurrent: 10
    tasks:
      - id: for_each
        type: io.kestra.plugin.core.flow.ForEach
        values: [1,2,3,4,5,6,7,8,9,10]
        concurrencyLimit: 10
        tasks:
          - id: subflow
            type: io.kestra.plugin.core.flow.Subflow
            namespace: "{{ flow.namespace }}"
            wait: true
            transmitFailed: true
            flowId: upload_namespace_files_flow
            inputs:
              subflow_number: "{{ taskrun.value }}"

Flow2 (get files and upload to namespace folder, optional cleanup):


id: upload_namespace_files_flow
namespace: kestra.namespace_test

inputs:
  - id: subflow_number
    type: INT
    defaults: 1

tasks:
  - id: working_directory
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository_1
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/1"
      - id: clone_repository_2
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/2"
      - id: clone_repository_3
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/3"
      - id: clone_repository_4
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/4"
      - id: upload_files_into_namespace
        type: io.kestra.plugin.core.namespace.UploadFiles
        files:
          - "glob:**/subflow_{{ inputs.subflow_number }}/**"
        namespace: "{{ flow.namespace }}"

  - id: start_subflow
    type: io.kestra.plugin.core.flow.Subflow
    namespace: "{{ flow.namespace }}"
    flowId: read_namespace_files_flow
    inputs:
      subflow_number: "{{ inputs.subflow_number }}"
    wait: true
    transmitFailed: true

  - id: remove_namespace_files
    # folders stay
    disabled: true
    type: io.kestra.plugin.core.namespace.DeleteFiles
    namespace: "{{ flow.namespace }}"
    files:
      - "subflow_{{ inputs.subflow_number }}/**"

Flow3 (access namespace files):


id: upload_namespace_files_flow
namespace: kestra.namespace_test

inputs:
  - id: subflow_number
    type: INT
    defaults: 1

tasks:
  - id: working_directory
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: clone_repository_1
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/1"
      - id: clone_repository_2
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/2"
      - id: clone_repository_3
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/3"
      - id: clone_repository_4
        type: io.kestra.plugin.git.Clone
        url: https://github.com/kestra-io/dbt-example
        branch: main
        directory: "subflow_{{ inputs.subflow_number }}/4"
      - id: upload_files_into_namespace
        type: io.kestra.plugin.core.namespace.UploadFiles
        files:
          - "glob:**/subflow_{{ inputs.subflow_number }}/**"
        namespace: "{{ flow.namespace }}"

  - id: start_subflow
    type: io.kestra.plugin.core.flow.Subflow
    namespace: "{{ flow.namespace }}"
    flowId: read_namespace_files_flow
    inputs:
      subflow_number: "{{ inputs.subflow_number }}"
    wait: true
    transmitFailed: true

  - id: remove_namespace_files
    # folders stay
    #disabled: true
    type: io.kestra.plugin.core.namespace.DeleteFiles
    namespace: "{{ flow.namespace }}"
    files:
      - "subflow_{{ inputs.subflow_number }}/**"

Environment

  • Kestra Version: develop
    OSS on kubernetes
    AWS S3 backend
    version 0.20.0
@MarthaScheffler MarthaScheffler added area/backend Needs backend code changes area/frontend Needs frontend code changes bug Something isn't working labels Jan 8, 2025
@github-project-automation github-project-automation bot moved this to Backlog in Issues Jan 8, 2025
@MarthaScheffler
Copy link
Author

some additional details: After deleting all (empty) folders in the namespace, when I navigate anywhere else and then back into the namespace, I suddenly see a new folder popping up. filled with files. it seems, that's one of the folders, that got uploaded, but was never accessible. Then I went to my s3 bucket on AWS, and indeed, there are plenty of folders for the namespace, and they are filled with files, although not shown in the kestra UI. I deleted all of those now on s3, and my flows run again. probably until whatever limit is filled up again...

@MarthaScheffler
Copy link
Author

could have something to do with listObjects on S3 only returning up to 1000 objects?? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects.html

@Ben8t
Copy link
Member

Ben8t commented Jan 13, 2025

Thanks Martha, @fhussonnois that sounds similar to a 1000's limitation we discussed few month ago right 🤔 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend Needs backend code changes area/frontend Needs frontend code changes bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

2 participants