Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(data-warehouse): Wait for compaction to complete #29325

Merged
merged 2 commits into from
Feb 28, 2025

Conversation

Gilbert09
Copy link
Member

Problem

  • We don't wait for deltalake compaction to complete before moving onto copying files over to S3

Changes

  • Wait for the compaction job to complete
  • Log compaction results out

@Gilbert09 Gilbert09 requested a review from a team February 27, 2025 22:26
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR enhances the Delta Lake compaction process by ensuring it completes before proceeding with S3 file operations.

  • Added wait logic in trigger_compaction_job() to block until compaction workflow completes when not in DEBUG/TEST mode
  • Added detailed logging for compaction operations in compact_table() method, capturing and logging both compact and vacuum statistics
  • Added support for "deltalake-compaction-job" workflow type in the logger to properly extract log source ID and set log source name
  • Improved observability of the compaction process through structured JSON logging of operation results

3 file(s) reviewed, 2 comment(s)
Edit PR Review Bot Settings | Greptile

@@ -35,6 +36,10 @@ def trigger_compaction_job(job: ExternalDataJob, schema: ExternalDataSchema) ->
),
)
)

if not DEBUG and not TEST:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not do this in the test environment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we don't run both jobs concurrently in tests. I think it'd be a pain in the ass to set up properly

@Gilbert09 Gilbert09 enabled auto-merge (squash) February 28, 2025 08:50
@Gilbert09 Gilbert09 merged commit 26e65af into master Feb 28, 2025
89 checks passed
@Gilbert09 Gilbert09 deleted the tom/compaction-waiting-and-logs branch February 28, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants