Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[server][dvc] add table format info in the blob transfer #1428

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jingy-li
Copy link
Contributor

@jingy-li jingy-li commented Jan 10, 2025

[server][dvc] add table format info in the blob transfer

There is an edge case where multiple file formats may coexist within a single cluster. For instance, DVC1 might use a plain-table format while DVC2 uses a block-based format. If DVC2 requests data from DVC1 during bootstrap, it could end up with a plain-table store in DVC2, leading to read issues.

This PR addresses the issue by:

  • Receiver side: Including the table format in the bootstrap request URI. An example URI: [0]""/[1]"store"/[2]"version"/[3]"partition/[4]"table format".

  • Sender side: Storing the current snapshot format in the snapshot manager. When a bootstrap request arrives, the snapshot manager compares the existing snapshot format with the requested table format. If they do not match, it returns a 404 error to reject the blob transfer request.

How was this PR tested?

Integration test "testBlobTransferThrowExceptionIfTableFormatNotMatch".

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.


// Check the snapshot table format
BlobTransferTableFormat currentSnapshotTableFormat = blobSnapshotManager.getBlobTransferTableFormat();
if (!blobTransferRequest.getRequestTableFormat().name().equals(currentSnapshotTableFormat.name())) {
Copy link

@lusong64 lusong64 Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would using enum equal be more efficient than using String comparison? Enum is singleton so the comparison is safe.

if (blobTransferRequest.getRequestTableFormat() != currentSnapshotTableFormat) {
// yada yada...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated in the next commit. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants