Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] add TFileMerger options to mirror RNTupleMergeOptions #17299

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

silverweed
Copy link
Contributor

  • tested changes locally
  • updated the docs (if necessary)

Copy link

github-actions bot commented Dec 18, 2024

Test Results

    17 files      17 suites   4d 11h 23m 56s ⏱️
 2 694 tests  2 693 ✅ 0 💤 1 ❌
44 144 runs  44 143 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit fb88af4.

♻️ This comment has been updated with latest results.

@@ -55,6 +55,15 @@ struct RSealedPageMergeData;

class RClusterPool;

/// Set of merging options to pass to RNTupleMerger.
/// If you're using the merger through TFileMerger you need to give it string-based options instead.
/// Here is the mapping for the TFileMerger options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information should also be added to the release notes.

@@ -427,10 +427,10 @@ int main( int argc, char **argv )
else
newcomp = ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault;
delete firstInput;
fileMerger.SetMergeOptions(TString("first_source_compression"));
fileMerger.SetMergeOptions(TString("rnt:FirstSrcCompression"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be handled differently than the 'same' option for TTree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how TTree handles the case, but my reasoning is that, since RNTuple has a different default compression than TFile/TTree, we want to distinguish the cases of "use default compression" (user didn't pass any flag to hadd) and that of "use first file compression" (user passed -ff or -fk).
As far as I know, we have no way of telling which case are we in from RNTuple::Merge, so we rely on explicit merge options for that.

Do you think there is a better way of handling it? I might be missing something that would let us avoid these options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, there is no good reason to make this option being 'RNTuple' only. The fact that TTree is currently cumbersomely doing an approximate implementation is interesting but not relevant (IMHO).

Currently TTree uses the first file's global compression setting to set the compression to use either:

  • This is the right choice, then for symmetry reason it should be the same choice for RNTuple (and if RNTuple wants/needs more, we may need a new option 'use the compression of the first RNTuple/TTree').
  • This is not the right choice, then TTree should be updated to switch behavior when seeing the flag to be like RNTUple and use the compression of the first TTree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a feasible change to make, I'd propose we go for point 2 and update the TTree behavior to use the first TTree compression when the flag is present (in which case the flag would just be "FirstSrcCompression")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I think we can have the "update the TTree behavior to use the first TTree compression when the flag is present" to be a subsequent PR.

Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle looks good to me, but see also Philippe's comments.

@@ -427,10 +427,10 @@ int main( int argc, char **argv )
else
newcomp = ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault;
delete firstInput;
fileMerger.SetMergeOptions(TString("first_source_compression"));
fileMerger.SetMergeOptions(TString("rnt:FirstSrcCompression"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally on the naming style: shall we align it with the style of the RNTuple dictionary options? That would make it rntuple.firstSrcCompression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the dot instead of colon, and I guess rntuple can be a bit more informative, but I like the capitalized first letter better because it makes for a cleaner mapping with the root naming style: when mapping options directly to fields of a struct you simply have to drop the f, and when you map to enum names you simply drop the k (e.g. MergingMode=Union).
That said, I have no strong opinion except that we should be coherent across all our API.

} else {
newcomp = ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault;
fileMerger.SetMergeOptions(TString("default_compression"));
fileMerger.SetMergeOptions(TString("rnt:DefaultCompression"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That options is also not RNTuple specific. The user requested the 'default' compression, this is true through-out. The fact that everything else, 'just' setting the request compression to the global default means that until now pushing this information down was redundant but now it is not redundant anymore (no way to know if newcomp is set to the default by happenstance or intentionally). The fact that only RNTuple will (for now?) use this information does not make it RNTuple specific.

fileMerger.OutputFile(fileGuard3.GetPath().c_str(), "RECREATE");
fileMerger.AddFile(nt1.get());
fileMerger.AddFile(nt2.get());
fileMerger.SetMergeOptions(TString("rnt:MergingMode=Filter"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this option (rntuple:MergingMode), I would agree that it is specific to RNTuple has the list of mode would (likely?) be different for different types. (And even the selection might be different).

@silverweed silverweed force-pushed the tfile_merger_opts_ntuple branch from 27f7ebe to d37dfeb Compare January 7, 2025 08:18
@silverweed silverweed requested a review from dpiparo as a code owner January 8, 2025 08:20
@silverweed silverweed force-pushed the tfile_merger_opts_ntuple branch from 2a8fddb to 12d911d Compare January 8, 2025 08:21
- use rntuple.Foo instead of rnt:Foo
- remove rntuple prefix from some non-necessarily-rntuple-specific options
@silverweed silverweed force-pushed the tfile_merger_opts_ntuple branch from 12d911d to fb88af4 Compare January 9, 2025 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants