-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ntuple] add TFileMerger options to mirror RNTupleMergeOptions #17299
base: master
Are you sure you want to change the base?
[ntuple] add TFileMerger options to mirror RNTupleMergeOptions #17299
Conversation
silverweed
commented
Dec 18, 2024
- tested changes locally
- updated the docs (if necessary)
6201ad4
to
27f7ebe
Compare
Test Results 17 files 17 suites 4d 11h 23m 56s ⏱️ For more details on these failures, see this check. Results for commit fb88af4. ♻️ This comment has been updated with latest results. |
@@ -55,6 +55,15 @@ struct RSealedPageMergeData; | |||
|
|||
class RClusterPool; | |||
|
|||
/// Set of merging options to pass to RNTupleMerger. | |||
/// If you're using the merger through TFileMerger you need to give it string-based options instead. | |||
/// Here is the mapping for the TFileMerger options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information should also be added to the release notes.
main/src/hadd.cxx
Outdated
@@ -427,10 +427,10 @@ int main( int argc, char **argv ) | |||
else | |||
newcomp = ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault; | |||
delete firstInput; | |||
fileMerger.SetMergeOptions(TString("first_source_compression")); | |||
fileMerger.SetMergeOptions(TString("rnt:FirstSrcCompression")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be handled differently than the 'same' option for TTree
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how TTree handles the case, but my reasoning is that, since RNTuple has a different default compression than TFile/TTree, we want to distinguish the cases of "use default compression" (user didn't pass any flag to hadd) and that of "use first file compression" (user passed -ff or -fk).
As far as I know, we have no way of telling which case are we in from RNTuple::Merge, so we rely on explicit merge options for that.
Do you think there is a better way of handling it? I might be missing something that would let us avoid these options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, there is no good reason to make this option being 'RNTuple' only. The fact that TTree
is currently cumbersomely doing an approximate implementation is interesting but not relevant (IMHO).
Currently TTree
uses the first file's global compression setting to set the compression to use either:
- This is the right choice, then for symmetry reason it should be the same choice for RNTuple (and if RNTuple wants/needs more, we may need a new option 'use the compression of the first RNTuple/TTree').
- This is not the right choice, then TTree should be updated to switch behavior when seeing the flag to be like RNTUple and use the compression of the first TTree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a feasible change to make, I'd propose we go for point 2 and update the TTree behavior to use the first TTree compression when the flag is present (in which case the flag would just be "FirstSrcCompression")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I think we can have the "update the TTree behavior to use the first TTree compression when the flag is present" to be a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle looks good to me, but see also Philippe's comments.
main/src/hadd.cxx
Outdated
@@ -427,10 +427,10 @@ int main( int argc, char **argv ) | |||
else | |||
newcomp = ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault; | |||
delete firstInput; | |||
fileMerger.SetMergeOptions(TString("first_source_compression")); | |||
fileMerger.SetMergeOptions(TString("rnt:FirstSrcCompression")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally on the naming style: shall we align it with the style of the RNTuple dictionary options? That would make it rntuple.firstSrcCompression
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the dot instead of colon, and I guess rntuple
can be a bit more informative, but I like the capitalized first letter better because it makes for a cleaner mapping with the root naming style: when mapping options directly to fields of a struct you simply have to drop the f
, and when you map to enum names you simply drop the k
(e.g. MergingMode=Union
).
That said, I have no strong opinion except that we should be coherent across all our API.
main/src/hadd.cxx
Outdated
} else { | ||
newcomp = ROOT::RCompressionSetting::EDefaults::kUseCompiledDefault; | ||
fileMerger.SetMergeOptions(TString("default_compression")); | ||
fileMerger.SetMergeOptions(TString("rnt:DefaultCompression")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That options is also not RNTuple
specific. The user requested the 'default' compression, this is true through-out. The fact that everything else, 'just' setting the request compression to the global default means that until now pushing this information down was redundant but now it is not redundant anymore (no way to know if newcomp
is set to the default by happenstance or intentionally). The fact that only RNTuple
will (for now?) use this information does not make it RNTuple
specific.
fileMerger.OutputFile(fileGuard3.GetPath().c_str(), "RECREATE"); | ||
fileMerger.AddFile(nt1.get()); | ||
fileMerger.AddFile(nt2.get()); | ||
fileMerger.SetMergeOptions(TString("rnt:MergingMode=Filter")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this option (rntuple:MergingMode
), I would agree that it is specific to RNTuple
has the list of mode
would (likely?) be different for different types. (And even the selection might be different).
27f7ebe
to
d37dfeb
Compare
2a8fddb
to
12d911d
Compare
- use rntuple.Foo instead of rnt:Foo - remove rntuple prefix from some non-necessarily-rntuple-specific options
12d911d
to
fb88af4
Compare