-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Variable page size for batch PQ writes #9508
base: queue-timeouts-plus-batches
Are you sure you want to change the base?
[WIP] Variable page size for batch PQ writes #9508
Conversation
db1ec23
to
9ea962b
Compare
First attempt at variable page sizes that passes all tests
9ea962b
to
04c0841
Compare
@andrewvc sorry for the delay, taking a look now :) |
this.offsetMap = new IntVector(); | ||
this.checkSummer = new CRC32(); | ||
this.file = dirPath.resolve("page." + pageNum).toFile(); | ||
int existingFileSize = (int) file.length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewvc this is somewhat optimistic I think. We should do a safe cast here (using Guava's Ints
for example) here and hard fail if the length exceeds 2GB because we can't mmap
that kind of file.
I know what we won't be able to write that kind of file in the first place, so this shouldn't ever come up but with the amount of weird bugs we ran into I think we should have visibility on that kind of problem if it occurs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to handling the edge cases.
@andrewvc this LGTM for a merge to a feature branch for sure btw. Wouldn't wanna block you here if this blocks you with my nitpick on the typecast :) |
@original-brownbear I'll do a check against |
@andrewvc sure sounds good :) |
Part of #9389 , this implements a variably sized page allowing us to write the entire batch to the given page.
This is in a WIP state, its ready for a pre-review to assess its approach, but needs tests and some code cleanup potentially.
This introduces a new variable and the idea of
pageDataCapacity
on top ofpageCapacity
. ThepageCapacity
is really the page file size,pageDataCapacity
is that size - the size of a file header for when we write the batch, since we need to know if it will fit the exact batch size.